简体   繁体   English

PyPDF2压缩

[英]PyPDF2 compression

I am struggling to compress my merged pdf's using the PyPDF2 module.我正在努力使用 PyPDF2 模块压缩合并后的 pdf。 this is my attempt based on http://www.blog.pythonlibrary.org/2012/07/11/pypdf2-the-new-fork-of-pypdf/这是我基于http://www.blog.pythonlibrary.org/2012/07/11/pypdf2-the-new-fork-of-pypdf/的尝试

import PyPDF2
path = open('path/to/hello.pdf', 'rb')
path2 = open('path/to/another.pdf', 'rb')
merger = PyPDF2.PdfFileMerger()
merger.append(fileobj=path2)
merger.append(fileobj=path)
pdf.filters.compress(merger)
merger.write(open("test_out2.pdf", 'wb'))

The error I receive is我收到的错误是

TypeError: must be string or read-only buffer, not file

I have also tried to compressing the pdf after the merging is complete.合并完成后,我还尝试压缩 pdf。 I am basing my failed compression on what file size I got after using PDFSAM with compression.我将失败的压缩基于使用 PDFSAM 进行压缩后得到的文件大小。 Any thoughts?有什么想法吗? Thanks.谢谢。

PyPDF2 doesn't have a reliable compression method. PyPDF2 没有可靠的压缩方法。 That said, there's a compress_content_streams() method with the following description:也就是说,有一个compress_content_streams()方法,其描述如下:

Compresses the size of this page by joining all content streams and applying a FlateDecode filter.通过加入所有内容流并应用 FlateDecode 过滤器来压缩此页面的大小。

However, it is possible that this function will perform no action if content stream compression becomes "automatic" for some reason.但是,如果内容流压缩由于某种原因变为“自动”,则此函数可能不会执行任何操作。

Again, this won't make any difference in most cases but you can try this code:同样,在大多数情况下这不会有任何区别,但您可以尝试以下代码:

from PyPDF2 import PdfReader, PdfWriter


writer = PdfWriter()

for pdf in ["path/to/hello.pdf", "path/to/another.pdf"]:
    reader = PdfReader(pdf)
    for page in reader.pages:
        page.compress_content_streams()
        writer.add_page(page)

with open("test_out2.pdf", "wb") as f:
    writer.write(f)

Your error says that it must be string or read-only buffer, not file.您的错误说它必须是字符串或只读缓冲区,而不是文件。

So it's better to write your merger to a byte or string.所以最好将你的合并写入一个字节或字符串。

import PyPDF2
from io import BytesIO

tmp = BytesIO()
path = open('path/to/hello.pdf', 'rb')
path2 = open('path/to/another.pdf', 'rb')
merger = PyPDF2.PdfFileMerger()
merger.append(fileobj=path2)
merger.append(fileobj=path)
merger.write(tmp)
PyPDF2.filters.compress(tmp.getvalue())
merger.write(open("test_out2.pdf", 'wb'))

The initial approach isn't that wrong.最初的方法并没有那么错误。 Just add the pages to your writer and compress them before writing to a file:只需将页面添加到您的编写器并在写入文件之前对其进行压缩:

...

for i in list(range(reader.numPages)):
    page = reader.getPage(i)
    writer.addPage(page);
for i in list(range(writer.getNumPages())):
    page.compressContentStreams()

...

pypdf offers several ways to reduce the file size: https://pypdf.readthedocs.io/en/latest/user/file-size.html pypdf提供了几种减小文件大小的方法: https://pypdf.readthedocs.io/en/latest/user/file-size.html

compress_content_streams is one that only has the disadvantage that it might take long (depends on the PDF; think of it as ZIP-for-PDF): compress_content_streams是一个唯一的缺点,它可能需要很长时间(取决于 PDF;将其视为 ZIP-for-PDF):

from pypdf import PdfReader, PdfWriter

reader = PdfReader("example.pdf")
writer = PdfWriter()

for page in reader.pages:
    page.compress_content_streams()  # This is CPU intensive!
    writer.add_page(page)

with open("out.pdf", "wb") as f:
    writer.write(f)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM