Code worked in shell but not when I tried in my project. - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: Code worked in shell but not when I tried in my project. (/thread-33346.html) |
Code worked in shell but not when I tried in my project. - yoohooos - Apr-18-2021 First of all, I really don't know what the error is, so I can't come up with a better header. I was learning scraping using Scrapy by following the tutorial on Scrapy's site. It worked fine following Scrapy's tutorial until I tried with another website that I was trying to scrape. I tried to use the Scrapy Shell with step by step coding and it worked as I wanted. The following gave me a blank file when I ran it and trying to store as a .json file [scrapy crawl -O xxx.json]. import scrapy from datetime import datetime class FinvizNewsSpider(scrapy.Spider): name = "finvizNews" start_urls = [ 'https://finviz.com/quote.ashx?t=ANPC' ] def parse(self, response): for news in response.css("tr"): yield { 'news_time' : datetime.strptime(news.css("td::text").get().replace('\xa0',''),'%b-%d-%y %I:%M%p')#, }Could you guys point out to me what's wrong with my code? The data that I'm interested in: https://imgur.com/a/mz3bTnr time in the red box Thank you very much! RE: Code worked in shell but not when I tried in my project. - snippsat - Apr-19-2021 (Apr-18-2021, 03:48 AM)yoohooos Wrote: Could you guys point out to me what's wrong with my code?You do not find data because data is generated bye JavaScript. This is a common problem that all faces when start doing some scraping. A solution is to use Selenium can use it with Scrapy, or if only want data from this site is easier to just use it alone. Example. from selenium import webdriver from selenium.webdriver.chrome.options import Options from time import sleep #--| Setup options = Options() options.add_argument("--headless") options.add_argument("--user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.50 Safari/537.36") #options.add_argument("--window-size=1980,1020") browser = webdriver.Chrome(executable_path=r'C:\cmder\bin\chromedriver.exe', options=options) #--| Parse or automation url = "https://finviz.com/quote.ashx?t=ANPC" browser.get(url) sleep(3) date_1 = browser.find_elements_by_css_selector('#news-table > tbody > tr:nth-child(1) > td:nth-child(1)')[0] date_2 = browser.find_elements_by_css_selector('#news-table > tbody > tr:nth-child(2) > td:nth-child(1)')[0] print(f'{date_1.text.strip()}\n{date_2.text.strip()}')
|