Code worked in shell but not when I tried in my project.

Code worked in shell but not when I tried in my project. - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html)
+--- Thread: Code worked in shell but not when I tried in my project. (/thread-33346.html)

Code worked in shell but not when I tried in my project. - yoohooos - Apr-18-2021

First of all, I really don't know what the error is, so I can't come up with a better header.

I was learning scraping using Scrapy by following the tutorial on Scrapy's site. It worked fine following Scrapy's tutorial until I tried with another website that I was trying to scrape. I tried to use the Scrapy Shell with step by step coding and it worked as I wanted. The following gave me a blank file when I ran it and trying to store as a .json file [scrapy crawl -O xxx.json].

import scrapy
from datetime import datetime

class FinvizNewsSpider(scrapy.Spider):
    name = "finvizNews"

    start_urls = [
        'https://finviz.com/quote.ashx?t=ANPC'
    ]

    def parse(self, response):
        for news in response.css("tr"):
            yield {
                'news_time' : datetime.strptime(news.css("td::text").get().replace('\xa0',''),'%b-%d-%y %I:%M%p')#,
                
            }

Could you guys point out to me what's wrong with my code?
The data that I'm interested in: https://imgur.com/a/mz3bTnr time in the red box

Thank you very much!

RE: Code worked in shell but not when I tried in my project. - snippsat - Apr-19-2021

(Apr-18-2021, 03:48 AM)yoohooos Wrote: Could you guys point out to me what's wrong with my code?
The data that I'm interested in: https://imgur.com/a/mz3bTnr time in the red box

You do not find data because data is generated bye JavaScript.
This is a common problem that all faces when start doing some scraping.
A solution is to use Selenium can use it with Scrapy,
or if only want data from this site is easier to just use it alone.
Example.

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from time import sleep

#--| Setup
options = Options()
options.add_argument("--headless")
options.add_argument("--user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.50 Safari/537.36")
#options.add_argument("--window-size=1980,1020")
browser = webdriver.Chrome(executable_path=r'C:\cmder\bin\chromedriver.exe', options=options)
#--| Parse or automation
url = "https://finviz.com/quote.ashx?t=ANPC"
browser.get(url)
sleep(3)
date_1 = browser.find_elements_by_css_selector('#news-table > tbody > tr:nth-child(1) > td:nth-child(1)')[0]
date_2 = browser.find_elements_by_css_selector('#news-table > tbody > tr:nth-child(2) > td:nth-child(1)')[0]
print(f'{date_1.text.strip()}\n{date_2.text.strip()}')

Output:Apr-16-21 04:15PM
Mar-10-21 07:25AM