Nov-20-2019, 02:27 PM
Hi guys,
I would like to scrap and organize data from html document.
I was learning scrapping and presenting data on different site structure(shopping/offers in elements)
I am curious, if something would be doable to scrap and organize data from thousands of documents which are not standarized? What i mean is that sometimes information is on top of the document, sometimes on the bottom, and pretty much always in different area.
Let's say that i would like to get data from "SUMMARY COMPENSATION TABLE" (from both of the files below).
For specific, the one only site it is doable(using indexes, find etc.)
Is there any kind of action which can be done to thousands of files like that? I cannot use specific div or other html-type because every table is named the same (with only different font).
I just don't know how to tell python look for "SUMMARY COMPENSATION TABLE" and get whole data from table below.
Example of page #1
https://www.sec.gov/Archives/edgar/data/...def14a.htm
Example of page #2
https://www.sec.gov/Archives/edgar/data/...def14a.htm
Do you have any thoughts, ideas if it is even doable?
I would like to scrap and organize data from html document.
I was learning scrapping and presenting data on different site structure(shopping/offers in elements)
I am curious, if something would be doable to scrap and organize data from thousands of documents which are not standarized? What i mean is that sometimes information is on top of the document, sometimes on the bottom, and pretty much always in different area.
Let's say that i would like to get data from "SUMMARY COMPENSATION TABLE" (from both of the files below).
For specific, the one only site it is doable(using indexes, find etc.)
Is there any kind of action which can be done to thousands of files like that? I cannot use specific div or other html-type because every table is named the same (with only different font).
I just don't know how to tell python look for "SUMMARY COMPENSATION TABLE" and get whole data from table below.
Example of page #1
https://www.sec.gov/Archives/edgar/data/...def14a.htm
Example of page #2
https://www.sec.gov/Archives/edgar/data/...def14a.htm
Do you have any thoughts, ideas if it is even doable?