Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Comparing PDFs
#1
Hi there,
I want to start off by saying I have 0 experience with coding and am more asking for to see if this would be possible.

I want to create an app that would compare one or many child PDFs to a master PDF, scan through each child PDF highlighting any differences in each child PDF and then save the output of the processed child PDFs with the differences visibly highlighted.
I've seen the ability to compare and output the differences via text but Highlighting the changes directly in each PDF would suit my needs better.
This is what I've got so far, this opens a basic GUI that lets you select a master and 1 child but doesn't seem to do anything when I hit the compare button.

import tkinter as tk
from tkinter import filedialog, messagebox
import fitz


class PDFCompare:
    def __init__(self, master):
        self.master = master
        master.title("PDF Compare")
        
        self.master_file = None
        self.child_file = None
        self.result_file = None
        
        self.master_label = tk.Label(master, text="Master PDF:")
        self.master_label.grid(row=0, column=0, sticky="w")
        self.master_button = tk.Button(master, text="Select", command=self.select_master_pdf)
        self.master_button.grid(row=0, column=1, sticky="w")

        self.child_label = tk.Label(master, text="Child PDF:")
        self.child_label.grid(row=1, column=0, sticky="w")
        self.child_button = tk.Button(master, text="Select", command=self.select_child_pdf)
        self.child_button.grid(row=1, column=1, sticky="w")

        self.compare_button = tk.Button(master, text="Compare", command=self.compare_pdfs)
        self.compare_button.grid(row=2, column=0, sticky="w")

    def select_master_pdf(self):
        self.master_file = filedialog.askopenfilename(title="Select Master PDF", filetypes=[("PDF Files", "*.pdf")])

    def select_child_pdf(self):
        self.child_file = filedialog.askopenfilename(title="Select Child PDF", filetypes=[("PDF Files", "*.pdf")])

    def compare_pdfs(self):
        if self.master_file is None or self.child_file is None:
            messagebox.showerror("Error", "Please select both master and child PDFs.")
            return

        try:
            master_doc = fitz.open(self.master_file)
            child_doc = fitz.open(self.child_file)
        except:
            messagebox.showerror("Error", "Failed to open PDF files.")
            return

        result_doc = fitz.open()

        for parent_page in master_doc:
            child_page = child_doc[int(parent_page.number) - 1]
            result = parent_page.compare(child_page)
            if result:
                diff_rects = result[0].rects
                for rect in diff_rects:
                    highlight = result_doc.add_highlight_annot(rect)
                    highlight.update()
            else:
                result_doc.insert_pdf(parent_page)

        if not result_doc:
            messagebox.showwarning("Warning", "No differences found.")
            return

        output_file = filedialog.asksaveasfilename(title="Save Output PDF", filetypes=[("PDF Files", "*.pdf")])
        if not output_file.endswith(".pdf"):
            output_file += ".pdf"

        try:
            result_doc.save(output_file)
            messagebox.showinfo("Success", "Comparison complete. Results saved to {}".format(output_file))
        except:
            messagebox.showerror("Error", "Failed to save output file.")


root = tk.Tk()
app = PDFCompare(root)
root.mainloop()
Some additional info:
All the PDFs I'm needing to compare are 1 page however I could have 20 versions of the same page, 1 being the master and the other 19 being "child" PDFs
An example of the parent PDF
[Image: HOd83vV.th.png]
and an example of the child PDF
[Image: HOd8FyB.th.png]
Reply


Messages In This Thread
Comparing PDFs - by CaseCRS - Mar-30-2023, 12:27 PM
RE: Comparing PDFs - by rob101 - Mar-31-2023, 01:23 PM
RE: Comparing PDFs - by CaseCRS - Mar-31-2023, 03:28 PM
RE: Comparing PDFs - by Gribouillis - Mar-31-2023, 01:45 PM
RE: Comparing PDFs - by rob101 - Mar-31-2023, 03:49 PM
RE: Comparing PDFs - by DPaul - Apr-01-2023, 05:46 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
  download pubmed PDFs using pubmed2pdf in python Wooki 8 5,707 Oct-19-2020, 03:06 PM
Last Post: jefsummers
  How to compare two PDFs for differences Normanie 2 2,477 Jul-30-2020, 07:31 AM
Last Post: millpond
  Concatenate multiple PDFs using python gmehta1996 0 2,161 Mar-29-2020, 09:48 PM
Last Post: gmehta1996
  Most optimized way to merge figures from multiple PDFs into one PDF page? dmm809 1 2,126 May-22-2019, 10:32 PM
Last Post: micseydel
  Merging pdfs with PyPDF2 Pedroski55 0 3,338 Mar-07-2019, 11:58 PM
Last Post: Pedroski55
Photo How to Extract Specific Words from PDFs with Python danvsv 1 4,582 Jan-17-2019, 11:07 AM
Last Post: Larz60+
  reading pdfs in windows10 - Python 3.6 cobra 1 5,406 May-10-2018, 09:40 PM
Last Post: nilamo
  How to parse pdfs in Python CharType 2 4,117 Jan-09-2017, 11:56 PM
Last Post: Blue Dog

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020