Beautifulsoup table question - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: Beautifulsoup table question (/thread-21441.html) |
Beautifulsoup table question - tantony - Sep-30-2019 I'm able to get the data from the HTML table, but how would I get only the data I need? For example, how would I read only '10 or more sm (16+ km)'? Line 7? page = urlopen(metar_link) soup = BeautifulSoup(page, 'html.parser') table = soup.find('table') for tr in table.find_all('tr'): metar = tr.find_all('td')[1].text.strip() print(metar) KBWI (Baltimore-Washington, MD, US) KBWI 301254Z 10007KT 10SM SCT017 BKN023 OVC039 21/18 A3027 RMK AO2 SLP249 T02060178 20.6°C ( 69°F) 17.8°C ( 64°F) [RH = 84%] 30.27 inches Hg (1025.1 mb) [Sea level pressure: 1024.9 mb] from the E (100 degrees) at 8 MPH (7 knots; 3.6 m/s) 10 or more sm (16+ km) 2300 feet AGL scattered clouds at 1700 feet AGL, broken clouds at 2300 feet AGL, overcast cloud deck at 3900 feet AGL Process finished with exit code 0 This is the website I'm trying to get data from. https://www.aviationweather.gov/metar/data?ids=kbwi&format=decoded&date=&hours=0 RE: Beautifulsoup table question - tantony - Sep-30-2019 Anyone with any suggestions? RE: Beautifulsoup table question - Larz60+ - Sep-30-2019 The following will show all data available by calling show_detail and then get the item of interest. The show detail explains how the index or tr[6] and td[0] and td[1] were determined. I use requests which is better than urlopen The css_select value is obtained in the browser (I use firefox),
from bs4 import BeautifulSoup import os import requests import NewPrettifyPage import sys class Weather: def __init__(self): self.pp = NewPrettifyPage.PrettifyPage() def show_detail(self, trs): for n, tr in enumerate(trs): tds = tr.find_all('td') for n1, td in enumerate(tds): print(f"\n--------------------- tr_{n}, td_{n1} ---------------------") print(f"{td}\ntext: {td.text.strip()}") def scrape_weather_info(self, metar_link): response = requests.get(metar_link) if response.status_code == 200: soup = BeautifulSoup(response.content, 'lxml') table = soup.select('#awc_main_content_wrap > table:nth-child(3)')[0] trs = table.find_all('tr') self.show_detail(trs) item_of_interest = trs[6] tds = item_of_interest.find_all('td') print(f"\nitem_of_interest: {tds[0].text.strip()} {tds[1].text.strip()}") if __name__ == '__main__': os.chdir(os.path.abspath(os.path.dirname(__file__))) sw = Weather() sw.scrape_weather_info('https://www.aviationweather.gov/metar/data?ids=kbwi&format=decoded&date=&hours=0')output:
RE: Beautifulsoup table question - tantony - Sep-30-2019 Thank you, I'll try that. So there's no way to read a table td value using BeautifulSoup? I'm new to Python + BeautifulSoup RE: Beautifulsoup table question - Larz60+ - Sep-30-2019 I do use Beautiful Soup ... Read the code! see line 22 RE: Beautifulsoup table question - tantony - Sep-30-2019 Ok thanks again |