Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Address Extraction
#8
From what you have shown, the word PROVIDER
is followed by 3-4 (5 ?) lines. The lines you need.
If you can rely on that layout, wherever it is,
and if you have the coordinates of "PROVIDER",
you can figure out the other lines.
The difference is that you have been using Adobe acrobat ( = a piece of software),
while Tesseract or pdfplumber are modules that need to be imported into a python program.
If you use tesseract, the coordinates can be found either with the to_data or the to_boxes option.
I would go for to_data first. Gives you the x/y coord of whole words.
Paul
standenman likes this post
It is more important to do the right thing, than to do the thing right.(P.Drucker)
Better is the enemy of good. (Montesquieu) = French version for 'kiss'.
Reply


Messages In This Thread
Address Extraction - by standenman - Apr-06-2024, 03:47 PM
RE: Address Extraction - by DPaul - Apr-07-2024, 09:36 AM
RE: Address Extraction - by standenman - Apr-07-2024, 12:43 PM
RE: Address Extraction - by DPaul - Apr-07-2024, 05:20 PM
RE: Address Extraction - by Pedroski55 - Apr-08-2024, 04:45 PM
RE: Address Extraction - by DPaul - Apr-08-2024, 05:32 PM
RE: Address Extraction - by standenman - Apr-10-2024, 04:00 PM
RE: Address Extraction - by DPaul - Apr-10-2024, 05:22 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Strategy for data extraction standenman 1 617 Mar-11-2024, 01:44 PM
Last Post: carecavoador
  Python Machine Learning: For Data Extraction JaneTan 0 1,901 Nov-24-2020, 06:45 AM
Last Post: JaneTan
  Feature extraction algorithm lukaznt 1 2,641 Mar-02-2018, 05:16 AM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020