简体   繁体   中英

(PyPDF2) Attempt to merge PDFs produces error

I've been trying to add a watermark as shown in Add text to Existing PDF using Python , but I keep getting error regarding the pdf data from reportlab. Is it a problem with the input pdf?

Setup: Python 3.3 (Anaconda Distribution) , Windows 7

from PyPDF2 import PdfFileMerger, PdfFileReader, PdfFileWriter
from six import BytesIO
from reportlab.lib.units import inch
from reportlab.pdfgen.canvas import Canvas
from reportlab.lib.pagesizes import letter

# Render watermark layer
stream = BytesIO()
c = Canvas(stream, pagesize=letter)
c.drawString(1 * inch, 8 * inch, "Hello World! " * 3)
c.showPage()
c.save()

stream.seek(0)
overlay = PdfFileReader(stream)
source = PdfFileReader("test.pdf")
writer = PdfFileWriter()

# Merge sorce and watermark pages
page0 = source.getPage(0)
page0.mergePage(overlay.getPage(0))
writer.insertPage(page0, 0)

# Write result to file
with open('merged.pdf', 'wb') as fp:
    writer.write(fp)

I get the following error:

Traceback (most recent call last):
  File "D:\IBP_Scripts\bsouthga\PDF Merge\merge.py", line 73, in <module>
    pageSelectionPDF("./merged_pdfs/FB1_report.pdf", [44,52])
  File "D:\IBP_Scripts\bsouthga\PDF Merge\merge.py", line 64, in pageSelectionPDF
    page0.mergePage(overlay.getPage(0))
  File "D:\Users\bsouthga\AppData\Local\Continuum\Anaconda\envs\py33\lib\site-packages\PyPDF2\pdf.py", line 1996, in mergePage
    self._mergePage(page2)
  File "D:\Users\bsouthga\AppData\Local\Continuum\Anaconda\envs\py33\lib\site-packages\PyPDF2\pdf.py", line 2042, in _mergePage
    page2Content = PageObject._pushPopGS(page2Content, self.pdf)
  File "D:\Users\bsouthga\AppData\Local\Continuum\Anaconda\envs\py33\lib\site-packages\PyPDF2\pdf.py", line 1956, in _pushPopGS
    stream = ContentStream(contents, pdf)
  File "D:\Users\bsouthga\AppData\Local\Continuum\Anaconda\envs\py33\lib\site-packages\PyPDF2\pdf.py", line 2428, in __init__
    stream = BytesIO(b_(stream.getData()))
  File "D:\Users\bsouthga\AppData\Local\Continuum\Anaconda\envs\py33\lib\site-packages\PyPDF2\generic.py", line 831, in getData
    decoded._data = filters.decodeStreamData(self)
  File "D:\Users\bsouthga\AppData\Local\Continuum\Anaconda\envs\py33\lib\site-packages\PyPDF2\filters.py", line 317, in decodeStreamData
    data = ASCII85Decode.decode(data)
  File "D:\Users\bsouthga\AppData\Local\Continuum\Anaconda\envs\py33\lib\site-packages\PyPDF2\filters.py", line 256, in decode
    data = [y for y in data if not (y in ' \n\r\t')]
  File "D:\Users\bsouthga\AppData\Local\Continuum\Anaconda\envs\py33\lib\site-packages\PyPDF2\filters.py", line 256, in <listcomp>
    data = [y for y in data if not (y in ' \n\r\t')]
TypeError: 'in <string>' requires string as left operand, not int

切换到python 2.7(再次是anaconda dist) ,它似乎可以正常工作,但3肯定是个问题

This is a problem in PyPDF2 library with python 3. If you want to use python 3 then you need to patch the ascii85decode class in filter.py file. I had the same problem and borrowing the ascii85decode code from ascii85.py in pdfminer3k (which is a port of pdfminer for python 3) and pasting that in the def in filter.py fixes the issue. The problem is that in python 3 it needs to return bytes, but in the old python 2 code it doesn't. There's a request in github for the change to be merged. Thought I'd answer here just in case.

Replace the code in the ascii85decode def in filter.py in PyPDF2 library with this code from pdfminer3k:

if isinstance(data, str):
    data = data.encode('ascii')
n = b = 0
out = bytearray()
for c in data:
    if ord('!') <= c and c <= ord('u'):
        n += 1
        b = b*85+(c-33)
        if n == 5:
            out += struct.pack(b'>L',b)
            n = b = 0
    elif c == ord('z'):
        assert n == 0
        out += b'\0\0\0\0'
    elif c == ord('~'):
        if n:
            for _ in range(5-n):
                b = b*85+84
            out += struct.pack(b'>L',b)[:n-1]
        break
return bytes(out)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM