Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Web Scraping in Python
#1
Hi,

I am new user
Need some assistance on web scraping, the url is (https://www.geny.com/reunions-courses-pm...2021-08-20)
I post my test.
The response on my test is 'None'
What is wrong, How can I have these response ?
/reunions-courses-pmu/_d2021-08-20?#reunion2">Duindigt (Pays-Bas)
/reunions-courses-pmu/_d2021-08-20?#reunion3">Fairview (Afrique du Sud)
/reunions-courses-pmu/_d2021-08-20?#reunion4">La Teste-de-Buch
/reunions-courses-pmu/_d2021-08-20?#reunion5">Clairefontaine-Deauville
/reunions-courses-pmu/_d2021-08-20?#reunion6">Cagnes-sur-Mer</a>
/reunions-courses-pmu/_d2021-08-20?#reunion7">York (Grande-Bretagne)
/reunions-courses-pmu/_d2021-08-20?#reunion8">Divonne-les-Bains

Attached Files

.py   main.py (Size: 553 bytes / Downloads: 291)
Reply
#2
Use code tag when post code.
You have to find all a first.
import requests
from bs4 import BeautifulSoup

url = 'https://www.geny.com/reunions-courses-pmu?date=2021-08-20'

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0',
    'Accept': 'text/html,*/*',
    'Accept-Language': 'en,en-US;q=0.7,en;q=0.3',
    'X-Requested-With': 'XMLHttpRequest',
    'Connection': 'keep-alive'}

resp = requests.get(url, headers=headers)
soup = BeautifulSoup(resp.text, 'lxml')

# using find
a = soup.find('div', {'class': 'yui-u liensReunion'})
#print(a.get('href'))
all_a = a.find_all('a') # Like this
>>> all_a
[<a href="/reunions-courses-pmu/_d2021-08-20;jsessionid=1B631A1EE89E45D365AC1AAC3180F2F2?#reunion1">Cabourg</a>,
 <a href="/reunions-courses-pmu/_d2021-08-20;jsessionid=1B631A1EE89E45D365AC1AAC3180F2F2?#reunion2">Duindigt (Pays-Bas)</a>,
 <a href="/reunions-courses-pmu/_d2021-08-20;jsessionid=1B631A1EE89E45D365AC1AAC3180F2F2?#reunion3">Fairview (Afrique du Sud)</a>,
 <a href="/reunions-courses-pmu/_d2021-08-20;jsessionid=1B631A1EE89E45D365AC1AAC3180F2F2?#reunion4">La Teste-de-Buch</a>,
 <a href="/reunions-courses-pmu/_d2021-08-20;jsessionid=1B631A1EE89E45D365AC1AAC3180F2F2?#reunion5">Clairefontaine-Deauville</a>,
 <a href="/reunions-courses-pmu/_d2021-08-20;jsessionid=1B631A1EE89E45D365AC1AAC3180F2F2?#reunion6">Cagnes-sur-Mer</a>,
 <a href="/reunions-courses-pmu/_d2021-08-20;jsessionid=1B631A1EE89E45D365AC1AAC3180F2F2?#reunion7">York (Grande-Bretagne)</a>,
 <a href="/reunions-courses-pmu/_d2021-08-20;jsessionid=1B631A1EE89E45D365AC1AAC3180F2F2?#reunion8">Divonne-les-Bains</a>]
>>> all_a[0].get('href', 'Not found')
'/reunions-courses-pmu/_d2021-08-20;jsessionid=1B631A1EE89E45D365AC1AAC3180F2F2?#reunion1'
>>> all_a[1].get('href', 'Not found')
'/reunions-courses-pmu/_d2021-08-20;jsessionid=1B631A1EE89E45D365AC1AAC3180F2F2?#reunion2'
>>> all_a[1].get('car', 'Not found')
'Not found'
Reply
#3
Hi, snippsat

Thank's a lot, that what I need
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020