Extracting the Address tag from multiple HTML files using BeautifulSoup

Thread Rating:

0 Vote(s) - 0 Average
1
2
3
4
5

Thread Modes

Extracting the Address tag from multiple HTML files using BeautifulSoup

Dredd
Programmer named Tim

Posts: 6

Threads: 2

Joined: Jan 2021

Reputation: 0

Jan-25-2021, 12:16 PM

(Jan-25-2021, 06:52 AM)buran Wrote:
(Jan-24-2021, 11:26 PM)Dredd Wrote: The method is returning 6x of the titles and only 1x address.
I don't see how it will return 6 titles and one address, but anyway
import csv
import glob
import os
from bs4 import BeautifulSoup

def parse(fname):
    with open(fname) as f:
        soup = BeautifulSoup(f.read(), 'lxml')
        title = soup.find("title")
        address = soup.find("address", class_={"styles_address__zrPvy"}) # do you really need find_all?
        return [title.text, address.text]


path = "C:\\Users\\mzoljan\\Downloads\\lksd\\"
with open('output2.csv', 'w') as myfile:
    writer = csv.writer(myfile)
    for infile in glob.glob(os.path.join(path, "*.html")):
        writer.writerow(parse(infile))
with a single html file in a folder this produce
Output:
"ToBeMe Early Learning - Five Dock, Five Dock | Toddle","25-27 Spencer Street, Five Dock"
There are bunch of <script> tags with JSON inside and it is possible to extract the above info also from them.

You da man Buran!

Find

Messages In This Thread

Extracting the Address tag from multiple HTML files using BeautifulSoup - by Dredd - Jan-24-2021, 03:40 AM

RE: Extracting the Address tag from multiple HTML files using BeautifulSoup - by buran - Jan-24-2021, 06:35 AM

RE: Extracting the Address tag from multiple HTML files using BeautifulSoup - by Dredd - Jan-24-2021, 09:30 AM

RE: Extracting the Address tag from multiple HTML files using BeautifulSoup - by buran - Jan-24-2021, 10:16 AM

RE: Extracting the Address tag from multiple HTML files using BeautifulSoup - by Dredd - Jan-24-2021, 11:26 PM

RE: Extracting the Address tag from multiple HTML files using BeautifulSoup - by snippsat - Jan-24-2021, 11:49 PM

RE: Extracting the Address tag from multiple HTML files using BeautifulSoup - by Dredd - Jan-25-2021, 01:47 AM

RE: Extracting the Address tag from multiple HTML files using BeautifulSoup - by buran - Jan-25-2021, 06:52 AM

RE: Extracting the Address tag from multiple HTML files using BeautifulSoup - by Dredd - Jan-25-2021, 12:16 PM

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Getting a URL from Amazon using requests-html, or beautifulsoup	aaander	1	1,755	Nov-06-2022, 10:59 PM Last Post: snippsat
	Populating list items to html code and create individualized html code files	ChainyDaisy	0	1,633	Sep-21-2022, 07:18 PM Last Post: ChainyDaisy
	requests-html + Beautifulsoup	klaarnou	0	2,493	Mar-21-2022, 05:31 PM Last Post: klaarnou
	BeautifulSoup Showing none while extracting image url	josephandrew	0	1,985	Sep-20-2021, 11:40 AM Last Post: josephandrew
	HTML multi select HTML listbox with Flask/Python	rfeyer	0	4,786	Mar-14-2021, 12:23 PM Last Post: rfeyer
	Extracting html data using attributes	WiPi	14	5,689	May-04-2020, 02:04 PM Last Post: snippsat
	Python3 + BeautifulSoup4 + lxml (HTML -> CSV) - How to loop to next HTML/new CSV Row	BrandonKastning	0	2,431	Mar-22-2020, 06:10 AM Last Post: BrandonKastning
	Web crawler extracting specific text from HTML	lewdow	1	3,468	Jan-03-2020, 11:21 PM Last Post: snippsat
	BeautifulSoup: Error while extracting a value from an HTML table	kawasso	3	3,339	Aug-25-2019, 01:13 AM Last Post: kawasso
	How do I extract specific lines from HTML files before and after a word?	glittergirl	1	5,180	Aug-06-2019, 07:23 AM Last Post: fishhook

Users browsing this thread: 1 Guest(s)

View a Printable Version

Extracting the Address tag from multiple HTML files using BeautifulSoup

User Panel Messages

Announcements