Using selenium, I navigate to a results page which contains javascript links to additional results.
The html looks like:
I'd like to create a list that contains the actual links to data locations if possible.
I'm thinking that there must be a way to use find_elements but might be barking up the wrong tree.
EDIT: Feb16: 5:30 EDT
code which only works until I want the links (I can link individually by modifying a css_selector link for page number, and then click but I's rather (if possible) gather all of the links from the first results page and save in a list.
Selenium code
The html looks like:
Output: <tbody><tr>
<td><span>1</span></td><td><a href="javascript:__doPostBack('ctl00$MainContent$SearchControl$grdSearchResultsEntity','Page$2')">2</a></td><td><a href="javascript:__doPostBack('ctl00$MainContent$SearchControl$grdSearchResultsEntity','Page$3')">3</a></td><td><a href="javascript:__doPostBack('ctl00$MainContent$SearchControl$grdSearchResultsEntity','Page$4')">4</a></td><td><a href="javascript:__doPostBack('ctl00$MainContent$SearchControl$grdSearchResultsEntity','Page$5')">5</a></td><td><a href="javascript:__doPostBack('ctl00$MainContent$SearchControl$grdSearchResultsEntity','Page$6')">6</a></td><td><a href="javascript:__doPostBack('ctl00$MainContent$SearchControl$grdSearchResultsEntity','Page$7')">7</a></td><td><a href="javascript:__doPostBack('ctl00$MainContent$SearchControl$grdSearchResultsEntity','Page$8')">8</a></td><td><a href="javascript:__doPostBack('ctl00$MainContent$SearchControl$grdSearchResultsEntity','Page$9')">9</a></td><td><a href="javascript:__doPostBack('ctl00$MainContent$SearchControl$grdSearchResultsEntity','Page$10')">10</a></td><td><a href="javascript:__doPostBack('ctl00$MainContent$SearchControl$grdSearchResultsEntity','Page$11')">11</a></td><td><a href="javascript:__doPostBack('ctl00$MainContent$SearchControl$grdSearchResultsEntity','Page$12')">12</a></td><td><a href="javascript:__doPostBack('ctl00$MainContent$SearchControl$grdSearchResultsEntity','Page$13')">13</a></td><td><a href="javascript:__doPostBack('ctl00$MainContent$SearchControl$grdSearchResultsEntity','Page$14')">14</a></td><td><a href="javascript:__doPostBack('ctl00$MainContent$SearchControl$grdSearchResultsEntity','Page$15')">15</a></td><td><a href="javascript:__doPostBack('ctl00$MainContent$SearchControl$grdSearchResultsEntity','Page$16')">16</a></td><td><a href="javascript:__doPostBack('ctl00$MainContent$SearchControl$grdSearchResultsEntity','Page$17')">17</a></td><td><a href="javascript:__doPostBack('ctl00$MainContent$SearchControl$grdSearchResultsEntity','Page$18')">18</a></td><td><a href="javascript:__doPostBack('ctl00$MainContent$SearchControl$grdSearchResultsEntity','Page$19')">19</a></td><td><a href="javascript:__doPostBack('ctl00$MainContent$SearchControl$grdSearchResultsEntity','Page$20')">20</a></td><td><a href="javascript:__doPostBack('ctl00$MainContent$SearchControl$grdSearchResultsEntity','Page$21')">21</a></td><td><a href="javascript:__doPostBack('ctl00$MainContent$SearchControl$grdSearchResultsEntity','Page$22')">22</a></td><td><a href="javascript:__doPostBack('ctl00$MainContent$SearchControl$grdSearchResultsEntity','Page$23')">...</a></td><td><a href="javascript:__doPostBack('ctl00$MainContent$SearchControl$grdSearchResultsEntity','Page$Last')">Last</a></td>
</tr>
</tbody>
Is there a method available in Selenium that will expand these links (or show direct links to data locations) without actually loading the page?I'd like to create a list that contains the actual links to data locations if possible.
I'm thinking that there must be a way to use find_elements but might be barking up the wrong tree.
EDIT: Feb16: 5:30 EDT
code which only works until I want the links (I can link individually by modifying a css_selector link for page number, and then click but I's rather (if possible) gather all of the links from the first results page and save in a list.
Selenium code
from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC class FindElementsJustTheFacts: def __init__(self): self.browser = None self.browser_running = False def get_page(self, letter): self.start_browser() self.browser.get('https://corp.sec.state.ma.us/CorpWeb/CorpSearch/CorpSearch.aspx') element = self.browser.find_element(By.CSS_SELECTOR, '#MainContent_txtEntityName') element.send_keys(letter) element = self.browser.find_element(By.CSS_SELECTOR, '#MainContent_ddRecordsPerPage > option:nth-child(4)').click() trs = self.browser.find_elements(By.CSS_SELECTOR, 'tr.link > td:nth-child(1) > table:nth-child(1)') for element in trs: print(self.browser.execute_script("arguments[0];", element)) if self.browser_running: self.stop_browser() def start_browser(self): # useragent = "Mozilla/5.0 (Linux; Android 8.0.0; Pixel 2 XL Build/OPD1.170816.004) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Mobile Safari/537.36" profile = webdriver.FirefoxProfile() options = webdriver.FirefoxOptions() options.set_preference("dom.webnotifications.serviceworker.enabled", False) options.set_preference("dom.webnotifications.enabled", False) self.browser = webdriver.Firefox(firefox_profile=profile,options=options) self.browser.implicitly_wait(30) self.browser_running = True def stop_browser(self): self.browser.close() self.browser_running = False def main(): fej = FindElementsJustTheFacts() fej.get_page(letter='1') if __name__ == '__main__': main()