简体   繁体   中英

Python PDFMerger Too Slow

I am using PDFMerger from PyPDF2. My program is basically reading all PDFs in a folder and merges them into a single one. I have made a test with 15 PDF files each is 500kb and it worked like a charm. Whole process was finished within a second. However when I tried with large numbers process took too long then I anticipated. I have tried merging 1000 files each is 500kb, reading and appending all these PDFs took 3 seconds in total but when it comes to writing the PDF it took line 67 seconds. I have tried 2 levels of merging (500 into 1 and other 500 into other 1 then merging the final 2) but it around same duration. Is there any way to speed up this writing process?

I am adding my code below.

            merger = PdfMerger()
            for pdf in dirs:
                if pdf.endswith('pdf'):
                       merger.append(pdf)

            merger.write(filename)
            merger.close()

This is more a long comment than an answer.

I just tried this with the latest version of PyPDF2:

from PyPDF2 import PdfReader, PdfWriter
import time

reader =PdfReader("a-two-page-doc.pdf")
writer = PdfWriter()

for i in range(1000):
    writer.append(reader)


t0 = time.time()
with open("out-2000-pages.pdf", "wb") as fp:
    writer.write(fp)
t1 = time.time()

print(f"{t1-t0:.2f}s")

That took about 0.67s on my machine.

Which version of PyPDF2 did you use? Which version of Python? Is there maybe something about the specific PDF? How big is the single PDF? Did you enable some compression features?

Without a lot more details, nobody will be able to help you.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM