[英](PyPDF2) Attempt to merge PDFs produces error
我一直在尝试添加水印,如使用Python将文本添加到现有PDF中所示,但我不断收到关于来自reportlab的pdf数据的错误。 输入的pdf是否有问题?
设置: Python 3.3(Anaconda Distribution) , Windows 7
from PyPDF2 import PdfFileMerger, PdfFileReader, PdfFileWriter
from six import BytesIO
from reportlab.lib.units import inch
from reportlab.pdfgen.canvas import Canvas
from reportlab.lib.pagesizes import letter
# Render watermark layer
stream = BytesIO()
c = Canvas(stream, pagesize=letter)
c.drawString(1 * inch, 8 * inch, "Hello World! " * 3)
c.showPage()
c.save()
stream.seek(0)
overlay = PdfFileReader(stream)
source = PdfFileReader("test.pdf")
writer = PdfFileWriter()
# Merge sorce and watermark pages
page0 = source.getPage(0)
page0.mergePage(overlay.getPage(0))
writer.insertPage(page0, 0)
# Write result to file
with open('merged.pdf', 'wb') as fp:
writer.write(fp)
我收到以下错误:
Traceback (most recent call last):
File "D:\IBP_Scripts\bsouthga\PDF Merge\merge.py", line 73, in <module>
pageSelectionPDF("./merged_pdfs/FB1_report.pdf", [44,52])
File "D:\IBP_Scripts\bsouthga\PDF Merge\merge.py", line 64, in pageSelectionPDF
page0.mergePage(overlay.getPage(0))
File "D:\Users\bsouthga\AppData\Local\Continuum\Anaconda\envs\py33\lib\site-packages\PyPDF2\pdf.py", line 1996, in mergePage
self._mergePage(page2)
File "D:\Users\bsouthga\AppData\Local\Continuum\Anaconda\envs\py33\lib\site-packages\PyPDF2\pdf.py", line 2042, in _mergePage
page2Content = PageObject._pushPopGS(page2Content, self.pdf)
File "D:\Users\bsouthga\AppData\Local\Continuum\Anaconda\envs\py33\lib\site-packages\PyPDF2\pdf.py", line 1956, in _pushPopGS
stream = ContentStream(contents, pdf)
File "D:\Users\bsouthga\AppData\Local\Continuum\Anaconda\envs\py33\lib\site-packages\PyPDF2\pdf.py", line 2428, in __init__
stream = BytesIO(b_(stream.getData()))
File "D:\Users\bsouthga\AppData\Local\Continuum\Anaconda\envs\py33\lib\site-packages\PyPDF2\generic.py", line 831, in getData
decoded._data = filters.decodeStreamData(self)
File "D:\Users\bsouthga\AppData\Local\Continuum\Anaconda\envs\py33\lib\site-packages\PyPDF2\filters.py", line 317, in decodeStreamData
data = ASCII85Decode.decode(data)
File "D:\Users\bsouthga\AppData\Local\Continuum\Anaconda\envs\py33\lib\site-packages\PyPDF2\filters.py", line 256, in decode
data = [y for y in data if not (y in ' \n\r\t')]
File "D:\Users\bsouthga\AppData\Local\Continuum\Anaconda\envs\py33\lib\site-packages\PyPDF2\filters.py", line 256, in <listcomp>
data = [y for y in data if not (y in ' \n\r\t')]
TypeError: 'in <string>' requires string as left operand, not int
切换到python 2.7(再次是anaconda dist) ,它似乎可以正常工作,但3肯定是个问题
这是python 3在PyPDF2库中的问题。如果要使用python 3,则需要在filter.py文件中修补ascii85decode类。 我遇到了同样的问题,并从pdfminer3k(这是python 3的pdfminer的端口)中的ascii85.py借用了ascii85decode代码,并将其粘贴在filter.py中的def中解决了此问题。 问题是在python 3中它需要返回字节,但是在旧的python 2代码中它不需要返回字节。 github中有一个要求合并更改的请求。 以为我会在这里回答,以防万一。
用pdfminer3k中的以下代码替换PyPDF2库中filter.py中ascii85decode def中的代码:
if isinstance(data, str):
data = data.encode('ascii')
n = b = 0
out = bytearray()
for c in data:
if ord('!') <= c and c <= ord('u'):
n += 1
b = b*85+(c-33)
if n == 5:
out += struct.pack(b'>L',b)
n = b = 0
elif c == ord('z'):
assert n == 0
out += b'\0\0\0\0'
elif c == ord('~'):
if n:
for _ in range(5-n):
b = b*85+84
out += struct.pack(b'>L',b)[:n-1]
break
return bytes(out)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.