I have a list of urls in a csv file (I can either host said file on my local machine or online). I need to pull biz name, address, and phone # from the web pages in the list. I have all of the correct class names. I want to extract this data to a csv with the aforementioned columns.
From the csv:
https://slicelife.com/restaurants/wi/mil...aukee/menu
https://slicelife.com/restaurants/nj/nor...hvale/menu
https://slicelife.com/restaurants/mn/man...pizza/menu
https://slicelife.com/restaurants/pa/new...k-hut/menu
When I run the code, it will create a csv with the desired column headers, but no data due to errors. I CAN pull data from the scraped urls one at a time like this:
From the csv:
https://slicelife.com/restaurants/wi/mil...aukee/menu
https://slicelife.com/restaurants/nj/nor...hvale/menu
https://slicelife.com/restaurants/mn/man...pizza/menu
https://slicelife.com/restaurants/pa/new...k-hut/menu
When I run the code, it will create a csv with the desired column headers, but no data due to errors. I CAN pull data from the scraped urls one at a time like this:
# locationRawData = soup.find('div', attrs={"class": "f19xeu2d"}).text.encode('utf-8'), # pizzeriaName = soup.find('h1', attrs={"class": "f13p7rsj"}).text.encode('utf-8'), # address = soup.find('address', attrs={"class": "f1lfckhr"}).text.encode('utf-8'), # phoneNumber = soup.find('button', attrs={"class": "f12gt8lx"}).text.encode('utf-8'),I have tried:
from bs4 import BeautifulSoup import requests import json import csv from urllib.request import urlopen TrattoriArray = [] with open('aliveSlice.csv','r') as csvf: # Open file in read mode urls = csv.reader(csvf) for url in urls: TrattoriArray.append(url) # Add each url to list contents for url in TrattoriArray: # Parse through each url in the list. page = urlopen(url[0]).read() content = BeautifulSoup(page.content, "html.parser") pizzaArray = [] for pizzeria in content.findAll('div', attrs={"class": "f19xeu2d"}): pizzeriaObject = { "pizzeriaName": pizzeria.find('h1', attrs={"class": "f13p7rsj"}).text.encode('utf-8'), "address": pizzeria.find('address', attrs={"class": "f1lfckhr"}).text.encode('utf-8'), "phoneNumber": pizzeria.find('rc-c2d-number', attrs={"span": "rc-c2d-number"}).text.encode('utf-8'), } pizzaArray.append(pizzeriaObject) with open('pizzeriaData.json', 'w') as outfile: json.dump(pizzaArray, outfile) and import requests from bs4 import BeautifulSoup import csv with open('aliveSCRAPE.csv', newline='') as f_urls, open('output.csv', 'w', newline='') as f_output: csv_urls = csv.reader(f_urls) csv_output = csv.writer(f_output) csv_output.writerow(['locationRawData' , 'pizzeriaName' , 'address', 'Phone']) for line in csv_urls: r = requests.get(line[0]).text soup = BeautifulSoup(r.content, 'lxml') locationRawData = soup.find('h1') print('RAW :', locationRawData.text) pizzeriaName = soup.find('h1', class_='f13p7rsj').text pizzeria_name = pizzeria.split(':') print('pizzeriaName:', pizzeria_name[1]) address = soup.find_all('address', class_='f1lfckhr'}) print('Address :', address[2].text) phoneNumber = soup.find_all('button', class_='f12gt8lx') print('Phone :', phoneNumber[3].text) locationRawData = soup.find_all('div', class_='f19xeu2d'}) print('RAW :', locationRawData[4].text) csv_output.writerow([locationRawData.text, pizzeria_name[1], address[2].text, phoneNumber[3].text])And...a few other methods, which is the easiest? This is literally the first thing I have ever programmed in Python.
...\Desktop\scrapeYourPlate\test\Code>Python scrape.py RAW : Bakers Buck Hut Traceback (most recent call last): File "scrape.py", line 98, in <module> print('pizzeriaName:', pizzeriaName[1].text) File ...AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\bs4\element.py", line 1016, in __getitem__ return self.attrs[key] KeyError: 1 python python-3.x beautifulsoup ERROR: CodeNinjaGrasshopper 255 bronze badges File "scrape.py", line 98, in <module> print('pizzeriaName:', pizzeriaName[1].text) File "C:\...\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\bs4\element.py", line 1016, in getitem return self.attrs[key] KeyError: 1