Jan-25-2021, 12:16 PM
(Jan-25-2021, 06:52 AM)buran Wrote:(Jan-24-2021, 11:26 PM)Dredd Wrote: The method is returning 6x of the titles and only 1x address.I don't see how it will return 6 titles and one address, but anyway
import csv import glob import os from bs4 import BeautifulSoup def parse(fname): with open(fname) as f: soup = BeautifulSoup(f.read(), 'lxml') title = soup.find("title") address = soup.find("address", class_={"styles_address__zrPvy"}) # do you really need find_all? return [title.text, address.text] path = "C:\\Users\\mzoljan\\Downloads\\lksd\\" with open('output2.csv', 'w') as myfile: writer = csv.writer(myfile) for infile in glob.glob(os.path.join(path, "*.html")): writer.writerow(parse(infile))with a single html file in a folder this produce
There are bunch of
Output:"ToBeMe Early Learning - Five Dock, Five Dock | Toddle","25-27 Spencer Street, Five Dock"<script>
tags with JSON inside and it is possible to extract the above info also from them.
You da man Buran!