[英]Merging PDF files with Python3
I am writing a small script that needs to merge many one-page pdf files. 我正在编写一个小的脚本,需要合并许多一页的pdf文件。 I want the script to run with Python3 and to have as few dependencies as possible.
我希望脚本与Python3一起运行,并具有尽可能少的依赖关系。
For the PDF merging part, I tried using PyPdf . 对于PDF合并部分,我尝试使用PyPdf 。 However, the Python 3 support seems to be buggy;
但是,Python 3支持似乎有问题。 It can't handle inkscape generated PDF files (which I need).
它无法处理inkscape生成的PDF文件(我需要)。 I have the current git version of PyPdf installed, and the following test script doesn't work:
我已经安装了PyPdf的当前git版本,并且以下测试脚本不起作用:
import PyPDF2
output_pdf = PyPDF2.PdfFileWriter()
with open("testI.pdf", "rb") as input:
input_pdf = PyPDF2.PdfFileReader(input)
output_pdf.addPage(input_pdf.getPage(0))
with open("test.pdf", "wb") as output:
output_pdf.write(output)
It throws the following stack trace: 它引发以下堆栈跟踪:
Traceback (most recent call last):
File "test.py", line 7, in <module>
output.addPage(input.getPage(0))
File "/usr/lib/python3.3/site-packages/pyPdf/pdf.py", line 420, in getPage
self._flatten()
File "/usr/lib/python3.3/site-packages/pyPdf/pdf.py", line 574, in _flatten
self._flatten(page.getObject(), inherit)
File "/usr/lib/python3.3/site-packages/pyPdf/generic.py", line 165, in getObject
return self.pdf.getObject(self).getObject()
File "/usr/lib/python3.3/site-packages/pyPdf/pdf.py", line 616, in getObject
retval = readObject(self.stream, self)
File "/usr/lib/python3.3/site-packages/pyPdf/generic.py", line 66, in readObject
return DictionaryObject.readFromStream(stream, pdf)
File "/usr/lib/python3.3/site-packages/pyPdf/generic.py", line 526, in readFromStream
value = readObject(stream, pdf)
File "/usr/lib/python3.3/site-packages/pyPdf/generic.py", line 57, in readObject
return ArrayObject.readFromStream(stream, pdf)
File "/usr/lib/python3.3/site-packages/pyPdf/generic.py", line 152, in readFromStream
obj = readObject(stream, pdf)
File "/usr/lib/python3.3/site-packages/pyPdf/generic.py", line 86, in readObject
return NumberObject.readFromStream(stream)
File "/usr/lib/python3.3/site-packages/pyPdf/generic.py", line 231, in readFromStream
return FloatObject(name.decode("ascii"))
File "/usr/lib/python3.3/site-packages/pyPdf/generic.py", line 207, in __new__
return decimal.Decimal.__new__(cls, str(value), context)
TypeError: optional argument must be a context
The same script, however, works flawlessly with Python 2.7. 但是,相同的脚本可以完美地在Python 2.7中使用。
What am I doing wrong here? 我在这里做错了什么? Is it a bug in the library?
这是库中的错误吗? Can I work around it without touching the PyPDF library?
我可以解决该问题而无需接触PyPDF库吗?
So I found the answer. 所以我找到了答案。 The
decimal.Decimal
module in Python3.3 shows some weird behaviour. decimal.Decimal
中的decimal.Decimal
模块显示了一些奇怪的行为。 This is the corresponding StackOverflow question: Instantiate Decimal class I added some workaround to the PyPDF2 library and submitted a pull request. 这是对应的StackOverflow问题: 实例化Decimal类我向PyPDF2库添加了一些解决方法,并提交了请求请求。
Just to make sure you are aware of already existing tools that do exactly this: 只是为了确保您知道已经存在的工具可以完全做到这一点:
gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=finished.pdf file1.pdf file2.pdf
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.