Python Forum
Random Forest to Identify Page: Feature Selection - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: Random Forest to Identify Page: Feature Selection (/thread-35266.html)



Random Forest to Identify Page: Feature Selection - JaneTan - Oct-14-2021

Hi,

I am new to machine learning. I know of a proj that used Random Forest to identify the type of pages in financial reports - identify if a page is the CashFlow or Income Statement.

The features for the model:
1) Bag of Word (BOW) for all pages in all the financial reports
2) word_check_flow: 1 if page has word "flow"; 0 otherwise
3) word_check_income: 1 if page has {“income” & “expense”} or {“revenue”, “sales”, “loss”}; 0 otherwise

I am puzzled as to know why there is a need for word_check_flow & word_check_income as features when BOW will give the count of each word in the page.

Thank you