Python Forum
Image Scraper (beautifulsoup), stopped working, need to help see why
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Image Scraper (beautifulsoup), stopped working, need to help see why
#1
I wrote a little script 6 months or so ago with some help from a friend. It looked at a website and got the images from it. It used to work, but stopped working a week or so ago, on any machine I have. I'm really new to all this, and had to piece together the first one I wrote before we got it cleaned up.

I don't get any error message at all. So it's hard to troubleshoot what could have changed.

Here's the website I'm trying to get images from:
https://archive.4plebs.org/hr/thread/2866456/

Here is the code I've been using. I went through lots of iterations but this was the final one I had.

##########################################
#######    This is section for the main imports
import requests
import wget
import os

from bs4 import BeautifulSoup
from tqdm import tqdm
from urllib.parse import urljoin, urlparse
from time import time
from multiprocessing.pool import ThreadPool
from concurrent.futures import ThreadPoolExecutor
from time import sleep

##########################################
#######    This is section for choosing site and save folder
url = ''
folder = ''

url = input("Website:")
folder = input("Folder:")

##########################################
#######    This section I have NO idea what it does.  :)  Sets parser for sure
r  = requests.get(url, stream = True)
data = r.text
soup = BeautifulSoup(data, features = "lxml")

##########################################
#######    This section grabs all pictures tagged download and makes folders
for tag in soup.select('a.parent[download]'):
    dlthis = ('https:' + tag['href'])
    path = os.path.join(folder, tag['download'])
    myfile = requests.get(dlthis, allow_redirects=True, stream = True)
    if not os.path.isdir(folder):
        os.makedirs(folder)

##########################################
#######    Section for Saving Files, both work    
#    with open(path, 'wb') as f:
#        f.write(myfile.content)
    open(path, 'wb').write(myfile.content)
    
##########################################
I have iterations that do multi-thread, and basic ones that just print out the links. But, I can't seem to get it to show anything at all. I'm sure it has something to do with the request and parse from beautifulsoup

Any help you can give would be awesome. Thank You!

So, before I posted this, I wanted to make sure I tested everything I knew to test. So, I played around with it a little more. and it looks like there is a security feature installed now to probably block exactly what i'm trying to do... So, is there any way around it? Or a better way to pull pictures? Here's what I'm seeing:

h1>Access denied</h1>
  <p>This website is using a security service to protect itself from online attacks.</p>
  <ul class="cferror_details">
    <li>Ray ID: 60c8a5d2cc2b3a02</li>
    <li>Timestamp: 2021-01-04 23:13:01 UTC</li>
Thanks
Reply


Messages In This Thread
Image Scraper (beautifulsoup), stopped working, need to help see why - by woodmister - Jan-04-2021, 11:19 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Web scraper tomenzo123 8 4,485 Aug-18-2023, 12:45 PM
Last Post: Gaurav_Kumar
  Web scraper not populating .txt with scraped data BlackHeart 5 1,574 Apr-03-2023, 05:12 PM
Last Post: snippsat
  BeautifulSoup Showing none while extracting image url josephandrew 0 1,975 Sep-20-2021, 11:40 AM
Last Post: josephandrew
  Web scrapping - Stopped working peterjv26 2 3,139 Sep-23-2020, 08:30 AM
Last Post: peterjv26
  not getting image src in my BeautifulSoup csv file farhan275 11 3,850 Sep-14-2020, 04:52 PM
Last Post: buran
  Court Opinion Scraper in Python w/ BS4 (Currently exports to CSV) need help with SQL MidnightDreamer 4 3,077 Mar-12-2020, 09:57 AM
Last Post: BrandonKastning
  Python using BS scraper paulfearn100 1 2,595 Feb-07-2020, 10:22 PM
Last Post: snippsat
  web scraper using pathlib Larz60+ 1 3,254 Oct-16-2017, 05:27 PM
Last Post: Larz60+
  Need alittle hlpl with an image scraper. Blue Dog 8 7,815 Dec-24-2016, 08:09 PM
Last Post: Blue Dog
  Made a very simple email grabber(scraper) Blue Dog 4 6,977 Dec-13-2016, 06:25 AM
Last Post: wavic

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020