Beautifulsoup table question - Printable Version

Beautifulsoup table question - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html)
+--- Thread: Beautifulsoup table question (/thread-21441.html)

Beautifulsoup table question - tantony - Sep-30-2019

I'm able to get the data from the HTML table, but how would I get only the data I need? For example, how would I read only '10 or more sm (16+ km)'? Line 7?

    page = urlopen(metar_link)
    soup = BeautifulSoup(page, 'html.parser')
    table = soup.find('table')

    for tr in table.find_all('tr'):
        metar = tr.find_all('td')[1].text.strip()
        print(metar)

KBWI (Baltimore-Washington, MD, US)
KBWI 301254Z 10007KT 10SM SCT017 BKN023 OVC039 21/18 A3027 RMK AO2 SLP249 T02060178
20.6°C ( 69°F)
17.8°C ( 64°F) [RH =  84%]
30.27 inches Hg (1025.1 mb) [Sea level pressure: 1024.9 mb]
from the E (100 degrees) at   8 MPH (7 knots;  3.6 m/s)
10 or more sm (16+ km)
2300 feet AGL
scattered clouds at 1700 feet AGL, broken clouds at 2300 feet AGL, overcast cloud deck at 3900 feet AGL

Process finished with exit code 0

This is the website I'm trying to get data from.

https://www.aviationweather.gov/metar/data?ids=kbwi&format=decoded&date=&hours=0

RE: Beautifulsoup table question - tantony - Sep-30-2019

Anyone with any suggestions?

RE: Beautifulsoup table question - Larz60+ - Sep-30-2019

The following will show all data available by calling show_detail
and then get the item of interest.
The show detail explains how the index or tr[6] and td[0] and td[1] were determined.

I use requests which is better than urlopen

The css_select value is obtained in the browser (I use firefox),

place cursor over item of interest,
right click selected text and choose inspect element
in inspect window, move cursor over <table tag
right click
select copy
select css selector
paste to code soup.select(paste here)

from bs4 import BeautifulSoup
import os
import requests
import NewPrettifyPage
import sys


class Weather:
    def __init__(self):
        self.pp = NewPrettifyPage.PrettifyPage()

    def show_detail(self, trs):
        for n, tr in enumerate(trs):
            tds = tr.find_all('td')
            for n1, td in enumerate(tds):
                print(f"\n--------------------- tr_{n}, td_{n1} ---------------------")
                print(f"{td}\ntext: {td.text.strip()}")

    def scrape_weather_info(self, metar_link):
        response = requests.get(metar_link)
        if response.status_code == 200:
            soup = BeautifulSoup(response.content, 'lxml')
            table = soup.select('#awc_main_content_wrap > table:nth-child(3)')[0]
            trs = table.find_all('tr')
            self.show_detail(trs)
            item_of_interest = trs[6]
            tds = item_of_interest.find_all('td')
            print(f"\nitem_of_interest: {tds[0].text.strip()} {tds[1].text.strip()}")


if __name__ == '__main__':
    os.chdir(os.path.abspath(os.path.dirname(__file__)))
    sw = Weather()
    sw.scrape_weather_info('https://www.aviationweather.gov/metar/data?ids=kbwi&format=decoded&date=&hours=0')

output:

Output:--------------------- tr_0, td_0 ---------------------
<td align="right" width="130px"><span style="color: #3333CC; font-weight: bold">METAR for:</span></td>
text: METAR for:

--------------------- tr_0, td_1 ---------------------
<td>KBWI (Baltimore-Washington, MD, US) </td>
text: KBWI (Baltimore-Washington, MD, US)

--------------------- tr_1, td_0 ---------------------
<td align="right" valign="top"><span style="color: #9999CC; font-weight: bold">Text:</span></td>
text: Text:

--------------------- tr_1, td_1 ---------------------
<td style="background-color: #CCCCCC; font-weight: bold">KBWI 301454Z 12008KT 10SM SCT020 OVC042 23/16 A3029 RMK AO2 SLP255 SCT020 V BKN T02280161 53008</td>
text: KBWI 301454Z 12008KT 10SM SCT020 OVC042 23/16 A3029 RMK AO2 SLP255 SCT020 V BKN T02280161 53008

--------------------- tr_2, td_0 ---------------------
<td align="right"><span style="color: #9999CC; font-weight: bold">Temperature:</span></td>
text: Temperature:

--------------------- tr_2, td_1 ---------------------
<td> 22.8°C ( 73°F)</td>
text: 22.8°C ( 73°F)

--------------------- tr_3, td_0 ---------------------
<td align="right"><span style="color: #9999CC; font-weight: bold">Dewpoint:</span></td>
text: Dewpoint:

--------------------- tr_3, td_1 ---------------------
<td> 16.1°C ( 61°F) [RH =  66%]</td>
text: 16.1°C ( 61°F) [RH =  66%]

--------------------- tr_4, td_0 ---------------------
<td align="right"><span style="color: #9999CC; font-weight: bold">Pressure (altimeter):</span></td>
text: Pressure (altimeter):

--------------------- tr_4, td_1 ---------------------
<td>30.29 inches Hg (1025.8 mb) [Sea level pressure: 1025.5 mb]</td>
text: 30.29 inches Hg (1025.8 mb) [Sea level pressure: 1025.5 mb]

--------------------- tr_5, td_0 ---------------------
<td align="right"><span style="color: #9999CC; font-weight: bold">Winds:</span></td>
text: Winds:

--------------------- tr_5, td_1 ---------------------
<td>from the ESE (120 degrees) at   9 MPH (8 knots;  4.1 m/s)</td>
text: from the ESE (120 degrees) at   9 MPH (8 knots;  4.1 m/s)

--------------------- tr_6, td_0 ---------------------
<td align="right"><span style="color: #9999CC; font-weight: bold">Visibility:</span></td>
text: Visibility:

--------------------- tr_6, td_1 ---------------------
<td>10 or more sm (16+ km)</td>
text: 10 or more sm (16+ km)

--------------------- tr_7, td_0 ---------------------
<td align="right"><span style="color: #9999CC; font-weight: bold">Ceiling:</span></td>
text: Ceiling:

--------------------- tr_7, td_1 ---------------------
<td>4200 feet AGL</td>
text: 4200 feet AGL

--------------------- tr_8, td_0 ---------------------
<td align="right" valign="top"><span style="color: #9999CC; font-weight: bold">Clouds:</span></td>
text: Clouds:

--------------------- tr_8, td_1 ---------------------
<td> scattered clouds at 2000 feet AGL, overcast cloud deck at 4200 feet AGL</td>
text: scattered clouds at 2000 feet AGL, overcast cloud deck at 4200 feet AGL

item_of_interest: Visibility: 10 or more sm (16+ km)

RE: Beautifulsoup table question - tantony - Sep-30-2019

Thank you, I'll try that. So there's no way to read a table td value using BeautifulSoup? I'm new to Python + BeautifulSoup

RE: Beautifulsoup table question - Larz60+ - Sep-30-2019

I do use Beautiful Soup ... Read the code!
see line 22

RE: Beautifulsoup table question - tantony - Sep-30-2019

Ok thanks again