Python Forum
Beautiful soup and tags - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html)
+--- Thread: Beautiful soup and tags (/thread-19583.html)

Pages: 1 2


RE: Beautiful soup and tags - snippsat - Jul-08-2019

(Jul-08-2019, 12:16 PM)starter_student Wrote: and now there is no error but the output file is empty just with headers
That's because your parsing or something else is wrong.
Do test is small step,put in print() and do test in REPL.
store_details = {} should be outside of the loop.

The html code you posted it's just a mess.
To show how can test html code outside of a web-site.
from bs4 import BeautifulSoup as soup
import csv
import requests

html = '''\
<div id="storelist">
  <ul>
    <li>Coffee</li>
    <li>Tea</li>
    <li>Milk</li>
  </ul>
</div>'''

#code
from bs4 import BeautifulSoup as soup
import csv
import requests

#URL = "http:www.abc.com"
#r = requests.get(URL)
soup = BeautifulSoup(html, 'lxml')
table = soup.find('div', id="storelist")
print(table) # Test print
store_details = {}
for row in table.find_all('li'):
    store_details[row.text] = f'<{row.text}> parsed for site'

filename = 'store_details_tab.csv'
with open(filename, 'w') as f:
    w = csv.DictWriter(f, ['Coffee', 'Tea', 'Milk'])
    w.writeheader()
    w.writerow(store_details)
In csv:
Output:
Coffee,Tea,Milk <Coffee> parsed for site,<Tea> parsed for site,<Milk> parsed for site



RE: Beautiful soup and tags - starter_student - Jul-08-2019

(Jul-08-2019, 02:15 PM)snippsat Wrote:
(Jul-08-2019, 12:16 PM)starter_student Wrote: and now there is no error but the output file is empty just with headers
That's because your parsing or something else is wrong.
Do test is small step,put in print() and do test in REPL.
store_details = {} should be outside of the loop.

The html code you posted it's just a mess.
To show how can test html code outside of a web-site.
from bs4 import BeautifulSoup as soup
import csv
import requests

html = '''\
<div id="storelist">
  <ul>
    <li>Coffee</li>
    <li>Tea</li>
    <li>Milk</li>
  </ul>
</div>'''

#code
from bs4 import BeautifulSoup as soup
import csv
import requests

#URL = "http:www.abc.com"
#r = requests.get(URL)
soup = BeautifulSoup(html, 'lxml')
table = soup.find('div', id="storelist")
print(table) # Test print
store_details = {}
for row in table.find_all('li'):
    store_details[row.text] = f'<{row.text}> parsed for site'

filename = 'store_details_tab.csv'
with open(filename, 'w') as f:
    w = csv.DictWriter(f, ['Coffee', 'Tea', 'Milk'])
    w.writeheader()
    w.writerow(store_details)
In csv:
Output:
Coffee,Tea,Milk <Coffee> parsed for site,<Tea> parsed for site,<Milk> parsed for site

Thanks for this approach ... it helped me to understand some stuffs. The html code was just a sample ... here is the right structure with a nested div

[html]
<div id ="storelist" class>
<ul>
<li id ="00021455" class>
<div class ="wr-store-details">
<p> name </p>
<span class ="address dc">Street 2</span>
<span class ="city">LA</span>
</div>
</li>
<li>
</li>
.
.
.
</ul>
</div>
[/html]