Python Forum
Extracting the Address tag from multiple HTML files using BeautifulSoup
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Extracting the Address tag from multiple HTML files using BeautifulSoup
#1
Smile 
Hi All,

The below code works exactly how I want it to work for 'title' but just not working at all for 'address'.

path = "C:\\Users\\mpeter\\Downloads\\lksd\\"

titleList = []

for infile in glob.glob(os.path.join(path, "*.html")):
    markup = (infile)
    soup = BeautifulSoup(open(markup, "r").read(), 'lxml')
    title = soup.find_all("title")
    title = soup.title.string
    titleList.append(title)
    
streetAddressList = []

for infile in glob.glob(os.path.join(path, "*.html")):
    markup = (infile)
    soup = BeautifulSoup(open(markup, "r").read(), 'lxml')
    address = soup.find_all("address", class_={"styles_address__zrPvy"})
    address = soup.address.string
    streetAddressList.append(address)
  
with open('output2.csv', 'w') as myfile:
   writer = csv.writer(myfile)
   writer.writerows((titleList, streetAddressList))
Here is the HTML for the address element.

[<address class="styles_address__zrPvy"><svg class="styles_addressIcon__3Pu3L" height="42" viewbox="0 0 32 42" width="32" xmlns="http://www.w3.org/2000/svg"><path d="M14.381 41.153C2.462 23.873.25 22.1.25 15.75.25 7.051 7.301 0 16 0s15.75 7.051 15.75 15.75c0 6.35-2.212 8.124-14.131 25.403a1.97 1.97 0 01-3.238 0zM16 22.313a6.562 6.562 0 100-13.125 6.562 6.562 0 000 13.124z"></path></svg>Level 1 44 Market Street<!-- -->, <!-- -->Sydney</address>]

All I want is the Title and Address elements in string format, address works if I don't insert the .string line but just gives all HTML. Please help.
Reply


Messages In This Thread
Extracting the Address tag from multiple HTML files using BeautifulSoup - by Dredd - Jan-24-2021, 03:40 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Getting a URL from Amazon using requests-html, or beautifulsoup aaander 1 1,727 Nov-06-2022, 10:59 PM
Last Post: snippsat
  Populating list items to html code and create individualized html code files ChainyDaisy 0 1,623 Sep-21-2022, 07:18 PM
Last Post: ChainyDaisy
  requests-html + Beautifulsoup klaarnou 0 2,481 Mar-21-2022, 05:31 PM
Last Post: klaarnou
  BeautifulSoup Showing none while extracting image url josephandrew 0 1,976 Sep-20-2021, 11:40 AM
Last Post: josephandrew
  HTML multi select HTML listbox with Flask/Python rfeyer 0 4,745 Mar-14-2021, 12:23 PM
Last Post: rfeyer
  Extracting html data using attributes WiPi 14 5,649 May-04-2020, 02:04 PM
Last Post: snippsat
  Python3 + BeautifulSoup4 + lxml (HTML -> CSV) - How to loop to next HTML/new CSV Row BrandonKastning 0 2,415 Mar-22-2020, 06:10 AM
Last Post: BrandonKastning
  Web crawler extracting specific text from HTML lewdow 1 3,451 Jan-03-2020, 11:21 PM
Last Post: snippsat
  BeautifulSoup: Error while extracting a value from an HTML table kawasso 3 3,312 Aug-25-2019, 01:13 AM
Last Post: kawasso
  How do I extract specific lines from HTML files before and after a word? glittergirl 1 5,165 Aug-06-2019, 07:23 AM
Last Post: fishhook

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020