Oct-19-2020, 05:25 PM
I tried every way I can think of and search out in an attempt to scrap morningstar financials data into any processable form like csv or dataframe, for instance from here:
https://financials.morningstar.com/ratios/r.html?t=AAPL
There are a number of possible ways to achieve that. One way is to find the exact URL of that csv file. A number of online resources hint that the link should be
http://financials.morningstar.com/ajax/e...&order=asc
But it doesn't work now.
Another way is to automate the "Export" button clicking while the program opens the website through webdriver. Many resources point to this or similar solutions:
A third way is to scrap data from another primary source:
http://financials.morningstar.com/finan/...xxx&t=AAPL
It works! The table is beautifully printed and it entails the information I want. The only problem is that I have no idea how to change it into processable format like csv or dataframe.
How can I do that? Any help would be very appreciated.
https://financials.morningstar.com/ratios/r.html?t=AAPL
There are a number of possible ways to achieve that. One way is to find the exact URL of that csv file. A number of online resources hint that the link should be
http://financials.morningstar.com/ajax/e...&order=asc
But it doesn't work now.
Another way is to automate the "Export" button clicking while the program opens the website through webdriver. Many resources point to this or similar solutions:
from selenium import webdriver d = webdriver.Chrome() d.get('http://financials.morningstar.com/ratios/r.html?t=AAPL®ion=usa&culture=en-US') d.find_element_by_css_selector('.large_button').click() d.quit()I got no error or exception upon running this, but no file is downloaded afterwards. Other suggested value variations for the css_selector function don't work as well, I tested everything I saw.
A third way is to scrap data from another primary source:
http://financials.morningstar.com/finan/...xxx&t=AAPL
from bs4 import BeautifulSoup import requests import re import json url1 = 'http://financials.morningstar.com/finan/financials/getFinancePart.html?&callback=xxx&t=AAPL' url2 = 'http://financials.morningstar.com/finan/financials/getKeyStatPart.html?&callback=xxx&t=AAPL' soup1 = BeautifulSoup(json.loads(re.findall(r'xxx\((.*)\)', requests.get(url1).text)[0])['componentData'], 'lxml') soup2 = BeautifulSoup(json.loads(re.findall(r'xxx\((.*)\)', requests.get(url2).text)[0])['componentData'], 'lxml') def print_table(soup): for i, tr in enumerate(soup.select('tr')): row_data = [td.text for td in tr.select('td, th') if td.text] if not row_data: continue if len(row_data) < 12: row_data = ['X'] + row_data for j, td in enumerate(row_data): if j==0: print('{: >30}'.format(td), end='|') else: print('{: ^12}'.format(td), end='|') print() print_table(soup1) print() print_table(soup2)Credit here: https://stackoverflow.com/questions/5669...orningstar
It works! The table is beautifully printed and it entails the information I want. The only problem is that I have no idea how to change it into processable format like csv or dataframe.
How can I do that? Any help would be very appreciated.