Python Forum
Extracting the Address tag from multiple HTML files using BeautifulSoup
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Extracting the Address tag from multiple HTML files using BeautifulSoup
#9
(Jan-25-2021, 06:52 AM)buran Wrote:
(Jan-24-2021, 11:26 PM)Dredd Wrote: The method is returning 6x of the titles and only 1x address.
I don't see how it will return 6 titles and one address, but anyway

import csv
import glob
import os
from bs4 import BeautifulSoup

def parse(fname):
    with open(fname) as f:
        soup = BeautifulSoup(f.read(), 'lxml')
        title = soup.find("title")
        address = soup.find("address", class_={"styles_address__zrPvy"}) # do you really need find_all?
        return [title.text, address.text]


path = "C:\\Users\\mzoljan\\Downloads\\lksd\\"
with open('output2.csv', 'w') as myfile:
    writer = csv.writer(myfile)
    for infile in glob.glob(os.path.join(path, "*.html")):
        writer.writerow(parse(infile))
with a single html file in a folder this produce

Output:
"ToBeMe Early Learning - Five Dock, Five Dock | Toddle","25-27 Spencer Street, Five Dock"
There are bunch of <script> tags with JSON inside and it is possible to extract the above info also from them.

You da man Buran!
Reply


Messages In This Thread
RE: Extracting the Address tag from multiple HTML files using BeautifulSoup - by Dredd - Jan-25-2021, 12:16 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Getting a URL from Amazon using requests-html, or beautifulsoup aaander 1 1,755 Nov-06-2022, 10:59 PM
Last Post: snippsat
  Populating list items to html code and create individualized html code files ChainyDaisy 0 1,633 Sep-21-2022, 07:18 PM
Last Post: ChainyDaisy
  requests-html + Beautifulsoup klaarnou 0 2,493 Mar-21-2022, 05:31 PM
Last Post: klaarnou
  BeautifulSoup Showing none while extracting image url josephandrew 0 1,985 Sep-20-2021, 11:40 AM
Last Post: josephandrew
  HTML multi select HTML listbox with Flask/Python rfeyer 0 4,786 Mar-14-2021, 12:23 PM
Last Post: rfeyer
  Extracting html data using attributes WiPi 14 5,689 May-04-2020, 02:04 PM
Last Post: snippsat
  Python3 + BeautifulSoup4 + lxml (HTML -> CSV) - How to loop to next HTML/new CSV Row BrandonKastning 0 2,431 Mar-22-2020, 06:10 AM
Last Post: BrandonKastning
  Web crawler extracting specific text from HTML lewdow 1 3,468 Jan-03-2020, 11:21 PM
Last Post: snippsat
  BeautifulSoup: Error while extracting a value from an HTML table kawasso 3 3,339 Aug-25-2019, 01:13 AM
Last Post: kawasso
  How do I extract specific lines from HTML files before and after a word? glittergirl 1 5,180 Aug-06-2019, 07:23 AM
Last Post: fishhook

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020