Python Forum
Convert a PDF files to HTML files - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Convert a PDF files to HTML files (/thread-30418.html)



Convert a PDF files to HTML files - Underground - Oct-20-2020

Hi everyone,

I am a new user and I started to learn Python to do something in particulary.

I would like to convert PDF Files automatically to HTML files but I don't know how to do this.

I tried to use somes library but to be honest I have lot of difficult to understand and to be capable to reproduce the example.

So if someone can help me to convert thoose PDF Files in HTML document, it would be nice of you.

Thank you for you're helping.


RE: Convert a PDF files to HTML files - Larz60+ - Oct-20-2020

PDF is a great way to view data, but one of the most difficult for extracting data.
There are some packages that make the conversion easier (sometimes, depending on PDF format)
One such for PDF to html is pdftotree: https://pypi.org/project/pdftotree/

There are others (you will have to reasearch to find suitable match) https://pypi.org/search/?q=%22PDF+to+HTML%22&o=


RE: Convert a PDF files to HTML files - Underground - Oct-20-2020

(Oct-20-2020, 10:08 AM)Larz60+ Wrote: PDF is a great way to view data, but one of the most difficult for extracting data.
There are some packages that make the conversion easier (sometimes, depending on PDF format)
One such for PDF to html is pdftotree: https://pypi.org/project/pdftotree/

There are others (you will have to reasearch to find suitable match) https://pypi.org/search/?q=%22PDF+to+HTML%22&o=

Thank you for your answer.

I am going to check this package.
I hope there will be a very detailed example because I am a beginner in python programming


RE: Convert a PDF files to HTML files - Underground - Oct-25-2020

I'm sorry but I can't understand how it works.

I am trying to use this package but I don't know what I need to install. I am using Pycharm with Python 3.8


RE: Convert a PDF files to HTML files - Larz60+ - Oct-25-2020

Quote:I am trying to use this package but I don't know what I need to install
pip install pdftotree

documentation and examples: https://github.com/HazyResearch/pdftotree