Link implementation - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: Link implementation (/thread-38074.html) |
Link implementation - JonWayn - Aug-30-2022 Total newbie here. For starters in my scraping coding, I am faced with 2 websites that implement their links differently. In one case, I can return all the links in a table element and even close the window and be just fine because the href of each link will take me to its destination anyhow, but with the other website, those links will only take me back to the disclaimer page. What are the steps for involved in circumventing this behavior if any, or best practices in scraping sites like this? I am using scrapy and selenium RE: Link implementation - Larz60+ - Aug-30-2022 Please provide something to work with, what are the URL's? RE: Link implementation - JonWayn - Aug-30-2022 (Aug-30-2022, 07:49 PM)Larz60+ Wrote: Please provide something to work with, This is the one with links that always work: https://www.mshp.dps.missouri.gov/HP71/search.jsp This is the one that has a disclaimer page in which the links if not clicked directly from its own responses, only reloads the homepage: https://casesearch.courts.state.md.us/casesearch/inquiry-index.jsp RE: Link implementation - Larz60+ - Aug-31-2022 Thanks for the URL's: Just logged in (1:53 A.M. EST) will take a look in my morning if not already answered by another. RE: Link implementation - Larz60+ - Aug-31-2022 Took a quick look at the pages you provided. The first page has a query form which can be partially filled and then searched on I would use selenium for this both pages:
selenium can automate this process. I don't use scrappy, so can't say if it's capable or not. I would suggest the following quick tutorials (on this forum) Web-Scraping part-1 Web-scraping part-2 |