Python Forum
I wan't to Download all .zip Files From A Website (Project AI)
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
I wan't to Download all .zip Files From A Website (Project AI)
#41
Here's the deal. This site is very difficult to scrape.
The reason is that the download URL keeps changing (i would guess to prevent bots).
Try it, the url you gave me no longer works, but it did when posted.
This is taking too much of my time, and proving to be much more difficult because of moving target.
Reluctantly I can't spend any more time on it, at least not today (I have surgery in the AM, so have to prepare for that).

I would suggest getting the auto password part (Dead-eye) gave you first, then you can go to first page and run following to get all links:

name this one: Fspaths.py
from pathlib import Path
import os


class Fspaths:
    def __init__(self):
        os.chdir(os.path.abspath(os.path.dirname(__file__)))
        homepath = Path('.')

        self.datapath = homepath / 'data'
        self.datapath.mkdir(exist_ok=True)
        
        self.htmlpath = self.datapath / 'html'
        self.htmlpath.mkdir(exist_ok=True)

        self.flightsimpath = self.datapath / 'FlightSimFiles'
        self.flightsimpath.mkdir(exist_ok=True)

        self.page1_html = self.htmlpath / 'pagespan.html'
        self.links = self.flightsimpath / 'links.txt'

        self.base_catalog_url = 'https://www.flightsim.com/vbfs/fslib.php?searchid=65893537&page='

if __name__ == '__main__':
    Fspaths()
and this one: ScrapeUrlList.py
import Fspaths
from bs4 import BeautifulSoup
import requests


class ScrapeUrlList:
    def __init__(self):
        self.fpath = Fspaths.Fspaths()
        self.ziplinks = []

    def get_url(self, url):
        page = None
        response = requests.get(url)
        if response.status_code == 200:
            page = response.content
        else:
            print(f'Cannot load URL: {url}')
        return page

    def get_catalog(self):
        base_url = 'https://www.flightsim.com/vbfs'
        with self.fpath.links.open('w' ) as fp:
            baseurl = self.fpath.base_catalog_url
            for pageno in range(1, 254):
                url = f'https://www.flightsim.com/vbfs/fslib.php?searchid=65893537&page={pageno}'
                print(f'url: {url}')
                page = self.get_url(self.fpath.base_catalog_url)
            if page:
                soup = BeautifulSoup(page, 'lxml')
                zip_links = soup.find_all('div', class_="fsc_details")
                for link in zip_links:
                    fp.write(f"{link.find('a').text}, {base_url}/{link.find('a').get('href')}")
                input()
            else:
                print(f'No page: {url}')

def main():
    sul = ScrapeUrlList()
    sul.get_catalog()


if __name__ == '__main__':
    main()
The searchid is what changes, and you need to get a new seed (you can change code to use as an attribute)
before creating the download list.
Then (not written), you need to use the created list to download the zip files

This code will build a directory tree named 'data' from wherever you put the scripts.
the links file is created in a subdirectory named FlightSimFiles
Reply


Messages In This Thread
RE: I wan't to Download all .zip Files From A Website (Project AI) - by Larz60+ - Aug-28-2018, 06:10 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Website scrapping and download santoshrane 3 4,431 Apr-14-2021, 07:22 AM
Last Post: kashcode
  Login and download an exported csv file within a ribbon/button in a website Alekhya 0 2,715 Feb-26-2021, 04:15 PM
Last Post: Alekhya
  Cant Download Images from Unsplash Website firaki12345 1 2,353 Feb-08-2021, 04:15 PM
Last Post: buran
  Download some JPG files and make it a single PDF & share it rompdeck 5 5,788 Jul-31-2020, 01:15 AM
Last Post: Larz60+
  download pdf file from website m_annur2001 1 3,059 Jun-21-2019, 05:03 AM
Last Post: j.crater
  Access my webpage and download files from Python Pedroski55 7 5,795 May-26-2019, 12:08 PM
Last Post: snippsat
  Download all secret links from a map design website fyec 0 2,902 Jul-24-2018, 09:08 PM
Last Post: fyec
  I Want To Download Many Files Of Same File Extension With Either Wget Or Python, eddywinch82 15 14,847 May-20-2018, 06:05 PM
Last Post: eddywinch82

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020