Address Extraction

Thread Rating:

0 Vote(s) - 0 Average
1
2
3
4
5

Thread Modes

Address Extraction

Pedroski55
Giant Foot

Posts: 878

Threads: 134

Joined: Jul 2017

Reputation: 25

Apr-08-2024, 04:45 PM

The PDF is just a scan, an image PDF? Not actually a PDF with text??

If you are only dealing with images, convert to jpg, crop using PIL Image, then OCR the cropped image.

I presuppose that the Provider part is always roughly where it is in your example PDF. There is sufficient white background around to text to give leeway for various addresses.

Once you have a reliable set of coordinates, just crop every PDF page with those coords. Have to say, the scan could be better!

If you actually have text PDFs, try with fitz.

Find

Messages In This Thread

Address Extraction - by standenman - Apr-06-2024, 03:47 PM

RE: Address Extraction - by DPaul - Apr-07-2024, 09:36 AM

RE: Address Extraction - by standenman - Apr-07-2024, 12:43 PM

RE: Address Extraction - by DPaul - Apr-07-2024, 05:20 PM

RE: Address Extraction - by Pedroski55 - Apr-08-2024, 04:45 PM

RE: Address Extraction - by DPaul - Apr-08-2024, 05:32 PM

RE: Address Extraction - by standenman - Apr-10-2024, 04:00 PM

RE: Address Extraction - by DPaul - Apr-10-2024, 05:22 PM

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Strategy for data extraction	standenman	1	656	Mar-11-2024, 01:44 PM Last Post: carecavoador
	Python Machine Learning: For Data Extraction	JaneTan	0	1,913	Nov-24-2020, 06:45 AM Last Post: JaneTan
	Feature extraction algorithm	lukaznt	1	2,659	Mar-02-2018, 05:16 AM Last Post: Larz60+

Users browsing this thread: 1 Guest(s)

View a Printable Version

Address Extraction

User Panel Messages

Announcements