简体   繁体   English

Python PDFMerger 太慢

[英]Python PDFMerger Too Slow

I am using PDFMerger from PyPDF2.我正在使用 PyPDF2 中的 PDFMerger。 My program is basically reading all PDFs in a folder and merges them into a single one.我的程序基本上是读取文件夹中的所有 PDF 并将它们合并为一个。 I have made a test with 15 PDF files each is 500kb and it worked like a charm.我用 15 个 PDF 文件进行了测试,每个文件大小为 500kb,效果非常好。 Whole process was finished within a second.整个过程在一秒钟内完成。 However when I tried with large numbers process took too long then I anticipated.然而,当我尝试大量处理时,过程花费的时间比我预期的要长。 I have tried merging 1000 files each is 500kb, reading and appending all these PDFs took 3 seconds in total but when it comes to writing the PDF it took line 67 seconds.我曾尝试合并 1000 个文件,每个文件都是 500kb,读取和附加所有这些 PDF 总共花费了 3 秒,但是在编写 PDF 时,它花费了 67 秒。 I have tried 2 levels of merging (500 into 1 and other 500 into other 1 then merging the final 2) but it around same duration.我已经尝试了 2 个级别的合并(500 合并为 1,其他 500 合并为其他 1,然后合并最后的 2),但持续时间大致相同。 Is there any way to speed up this writing process?有什么办法可以加快这个写作过程吗?

I am adding my code below.我在下面添加我的代码。

            merger = PdfMerger()
            for pdf in dirs:
                if pdf.endswith('pdf'):
                       merger.append(pdf)

            merger.write(filename)
            merger.close()

This is more a long comment than an answer.这是一个比答案更长的评论。

I just tried this with the latest version of PyPDF2:我刚刚使用最新版本的 PyPDF2 尝试了这个:

from PyPDF2 import PdfReader, PdfWriter
import time

reader =PdfReader("a-two-page-doc.pdf")
writer = PdfWriter()

for i in range(1000):
    writer.append(reader)


t0 = time.time()
with open("out-2000-pages.pdf", "wb") as fp:
    writer.write(fp)
t1 = time.time()

print(f"{t1-t0:.2f}s")

That took about 0.67s on my machine.这在我的机器上花费了大约 0.67 秒。

Which version of PyPDF2 did you use?您使用的是哪个版本的 PyPDF2? Which version of Python?哪个版本的 Python? Is there maybe something about the specific PDF?是否有关于特定 PDF 的内容? How big is the single PDF?单个PDF有多大? Did you enable some compression features?您是否启用了某些压缩功能?

Without a lot more details, nobody will be able to help you.没有更多的细节,没有人能够帮助你。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM