Python Forum
Review my Django & BeautifulSoup Scraper
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Review my Django & BeautifulSoup Scraper
#1
I have some Python background, but this is my very first time working with Django. So I would appreciate it if someone could review my code to ensure that I'm on the right path.

scrape_product_data function first checks if the product is already in the database and then scrapes all details of the products. I removed BeautifulSoup scraping functionality here to keep the codebase clean and more readable.

store_product_data stores all scraped data into the database.

Thank you!

scrape.py
class ScrapeData(View):
    http_method_names = ['get', 'post']
    
    def scrape_product_data(self, request: HttpRequest, handle: str) -> HttpResponse:
        # Check if product already exists in databae
        if Product.objects.filter(handle=handle).exists():
            error_message = "Product already exists in the database!"
            messages.error(request, error_message)
            return render(request, 'scrape.html', {'error_message': error_message})

        session = requests.Session()
        session.headers.update({"User-Agent": "Googlebot/2.1 (+http://www.google.com/bot.html)"})

        url = f"https://domain.com/{handle}"
        page = session.get(url)
        soup = BeautifulSoup(page.content, 'lxml')
      
        # Scraper to scrape description, price, photos...
        # ...
        # ...

        product_data = {

            'product_handle': handle,
            'product_name': product_name,
            'product_description': product_description,
            'product_price': product_price,
            # ...
        }

        # Store data in the database
        self.store_product_data(product_data)

        return render(request, 'scrape.html', {'product_data': product_data})

    # Func to store data into DB
    def store_product_data(self, product_data):

        product = Product(
            handle=product_data['product_handle'],
            name=product_data['product_name'],
            description=product_data['product_description'],
            description=product_data['product_price'],
            # ...
        )
        product.save()

    def get(self, request: HttpRequest) -> HttpResponse:

        handle = request.GET.get('handle')
        return render(request, 'scrape.html')

    def post(self, request: HttpRequest) -> HttpResponse:
        return self.select_function(request)          
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020