Web Scraping in Python

phochka · Aug-22-2021, 11:59 AM

Hi,

I am new user
Need some assistance on web scraping, the url is (https://www.geny.com/reunions-courses-pm...2021-08-20)
I post my test.
The response on my test is 'None'
What is wrong, How can I have these response ?
/reunions-courses-pmu/_d2021-08-20?#reunion2">Duindigt (Pays-Bas)
/reunions-courses-pmu/_d2021-08-20?#reunion3">Fairview (Afrique du Sud)
/reunions-courses-pmu/_d2021-08-20?#reunion4">La Teste-de-Buch
/reunions-courses-pmu/_d2021-08-20?#reunion5">Clairefontaine-Deauville
/reunions-courses-pmu/_d2021-08-20?#reunion6">Cagnes-sur-Mer</a>
/reunions-courses-pmu/_d2021-08-20?#reunion7">York (Grande-Bretagne)
/reunions-courses-pmu/_d2021-08-20?#reunion8">Divonne-les-Bains

***snippsat*** · (This post was last modified: Aug-22-2021, 01:45 PM by snippsat.)

Use code tag when post code.
You have to find all a first.

import requests
from bs4 import BeautifulSoup

url = 'https://www.geny.com/reunions-courses-pmu?date=2021-08-20'

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0',
    'Accept': 'text/html,*/*',
    'Accept-Language': 'en,en-US;q=0.7,en;q=0.3',
    'X-Requested-With': 'XMLHttpRequest',
    'Connection': 'keep-alive'}

resp = requests.get(url, headers=headers)
soup = BeautifulSoup(resp.text, 'lxml')

# using find
a = soup.find('div', {'class': 'yui-u liensReunion'})
#print(a.get('href'))
all_a = a.find_all('a') # Like this

>>> all_a
[<a href="/reunions-courses-pmu/_d2021-08-20;jsessionid=1B631A1EE89E45D365AC1AAC3180F2F2?#reunion1">Cabourg</a>,
 <a href="/reunions-courses-pmu/_d2021-08-20;jsessionid=1B631A1EE89E45D365AC1AAC3180F2F2?#reunion2">Duindigt (Pays-Bas)</a>,
 <a href="/reunions-courses-pmu/_d2021-08-20;jsessionid=1B631A1EE89E45D365AC1AAC3180F2F2?#reunion3">Fairview (Afrique du Sud)</a>,
 <a href="/reunions-courses-pmu/_d2021-08-20;jsessionid=1B631A1EE89E45D365AC1AAC3180F2F2?#reunion4">La Teste-de-Buch</a>,
 <a href="/reunions-courses-pmu/_d2021-08-20;jsessionid=1B631A1EE89E45D365AC1AAC3180F2F2?#reunion5">Clairefontaine-Deauville</a>,
 <a href="/reunions-courses-pmu/_d2021-08-20;jsessionid=1B631A1EE89E45D365AC1AAC3180F2F2?#reunion6">Cagnes-sur-Mer</a>,
 <a href="/reunions-courses-pmu/_d2021-08-20;jsessionid=1B631A1EE89E45D365AC1AAC3180F2F2?#reunion7">York (Grande-Bretagne)</a>,
 <a href="/reunions-courses-pmu/_d2021-08-20;jsessionid=1B631A1EE89E45D365AC1AAC3180F2F2?#reunion8">Divonne-les-Bains</a>]
>>> all_a[0].get('href', 'Not found')
'/reunions-courses-pmu/_d2021-08-20;jsessionid=1B631A1EE89E45D365AC1AAC3180F2F2?#reunion1'
>>> all_a[1].get('href', 'Not found')
'/reunions-courses-pmu/_d2021-08-20;jsessionid=1B631A1EE89E45D365AC1AAC3180F2F2?#reunion2'
>>> all_a[1].get('car', 'Not found')
'Not found'

phochka · Aug-22-2021, 03:03 PM

Hi, snippsat

Thank's a lot, that what I need

Web Scraping in Python

User Panel Messages

Announcements