简体   繁体   English

PyPDF2:Stream 已意外结束

[英]PyPDF2: Stream has ended unexpectedly

I have a Python script which uses PyPDF2 to reverse the order of pages of a PDF.我有一个 Python 脚本,它使用 PyPDF2 来反转 PDF 的页面顺序。

from  PyPDF2 import PdfFileWriter, PdfFileReader

output = PdfFileWriter()
rpage = []
name = input("What's the file called?")

filename = name.split('.', 1)

input1 = PdfFileReader(open(name,'rb'), strict = False)

pages = list(range(1,input1.getNumPages() + 1))

for i in range(0, (input1.getNumPages())):
    rpage.append(pages[input1.getNumPages() - i -1])
for i in rpage:
    output.addPage(input1.getPage(i-1))

outputpath = filename[0] + '-reversed.pdf'

outputStream = open(outputpath, "wb")
output.write(outputStream)

Which functions as intended up until trying to write the output stream, where it returns this error:在尝试编写 output stream 之前,哪些功能按预期运行,它返回此错误:

PdfReadWarning: Invalid stream (index 59) within object 108 0: Stream has ended unexpectedly [pdf.py:1573]
Traceback (most recent call last):
  File "D:\Documents\Google Drive\Programming\Python\PDF Scripts\reverse pdf.py", line 22, in <module>
output.write(outputStream)
  File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 482, in write
self._sweepIndirectReferences(externalReferenceMap, self._root)
  File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 571, in _sweepIndirectReferences
self._sweepIndirectReferences(externMap, realdata)
  File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, value)
  File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 571, in _sweepIndirectReferences
self._sweepIndirectReferences(externMap, realdata)
  File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 556, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, data[i])
  File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 571, in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)
  File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 577, in _sweepIndirectReferences
    newobj = data.pdf.getObject(data)
  File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 1611, in getObject
    retval = readObject(self.stream, self)
  File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\generic.py", line 66, in readObject
    return DictionaryObject.readFromStream(stream, pdf)
  File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\generic.py", line 611, in readFromStream
    data["__streamdata__"] = stream.read(length)
TypeError: integer argument expected, got 'NullObject'

The code does create a PDF file but it has a size of 0KB and is, therefore, unreadable.该代码确实创建了一个 PDF 文件,但它的大小为 0KB,因此不可读。 I have tested a sample script to merge three PDFs found here which produces another empty file and results in this error:我已经测试了一个示例脚本来合并在这里找到的三个 PDF,这会产生另一个空文件并导致此错误:

PdfReadWarning: Invalid stream (index 59) within object 108 0: Stream has ended unexpectedly [pdf.py:1573]
Traceback (most recent call last):
  File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 1567, in _getObjectFromStream
    obj = readObject(streamData, self)
  File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\generic.py", line 98, in readObject
    return NumberObject.readFromStream(stream)
  File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\generic.py", line 269, in readFromStream
    num = utils.readUntilRegex(stream, NumberObject.NumberPattern)
  File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\utils.py", line 134, in readUntilRegex
    raise PdfStreamError("Stream has ended unexpectedly")
PyPDF2.utils.PdfStreamError: Stream has ended unexpectedly

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\Documents\Google Drive\Programming\Python\PDF Scripts\untitled1.py", line 27, in <module>
    merger.write(output)
  File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\merger.py", line 230, in write
    self.output.write(fileobj)
  File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 482, in write
    self._sweepIndirectReferences(externalReferenceMap, self._root)
  File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 571, in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)
  File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 571, in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)
  File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 556, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, data[i])
  File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 571, in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)
  File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 577, in _sweepIndirectReferences
    newobj = data.pdf.getObject(data)
  File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 1611, in getObject
    retval = readObject(self.stream, self)
  File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\generic.py", line 66, in readObject
    return DictionaryObject.readFromStream(stream, pdf)
  File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\generic.py", line 609, in readFromStream
    length = pdf.getObject(length)
  File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 1593, in getObject
    retval = self._getObjectFromStream(indirectReference)
  File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 1576, in _getObjectFromStream
    raise utils.PdfReadError("Can't read object stream: %s"%e)
PyPDF2.utils.PdfReadError: Can't read object stream: Stream has ended unexpectedly

The previous error is also outputted when this script is used to split a PDF into its constituent pages:当使用此脚本将 PDF 拆分为其组成页面时,也会输出前面的错误:

from PyPDF2 import PdfFileWriter, PdfFileReader 
infile = PdfFileReader(open('test.pdf', 'rb'))

for i in range(infile.getNumPages()):
    p = infile.getPage(i)
    outfile = PdfFileWriter()
    outfile.addPage(p)
    with open('page-%02d.pdf' % i, 'wb') as f:
        outfile.write(f)

The above code produces (n-1) readable PDFs but with nth PDF is an empty file.上面的代码生成 (n-1) 个可读的 PDF,但第 n 个 PDF 是一个空文件。 Any idea how I can fix this?知道如何解决这个问题吗?

Your script counts through the pages in several different places the purposes of which are not clear to me. 您的脚本会在几个不同的地方浏览页面,这些地方的目的对我来说并不明确。 I believe how you're counting backwards is the source of your error. 我相信您如何倒数是错误的根源。

I took your script and first adapted it to 2.7 (since that's what I'm running), then simplified it to walk backward through your source file once, creating your reversed file. 我采用了您的脚本,并首先将其调整为2.7(因为这就是我正在运行的脚本),然后对其进行了简化,以向后浏览源文件一次,从而创建反向文件。

from  PyPDF2 import PdfFileWriter, PdfFileReader

output = PdfFileWriter()
# rpage = [] removed because it's not needed anymore
name = raw_input("What's the file called? ") #Changed for the 2.7 environment

filename = name[:-4] #Simplified, since we know where the piece we want is.

input1 = PdfFileReader(name,"rb")
#Simplified, because I couldn't figure out why it was complex.

for i in range(input1.getNumPages(),0,-1): 
     #getNumPages counts like a human and gives the total number of pages
     #This counts backwards, so no need to count forward and use that to
     #reverse the numbers.
     output.addPage(input1.getPage(i-1))
     #getPage counts like a computer and needs to finish with page 0.

outputpath = filename + '-reversed.pdf'

outputStream = open(outputpath, "wb")
output.write(outputStream)
outputStream.close() #Closes the file and stream once you're done.

If all you want is to be able to reverse the pages for printing, and you don't care about trying to preserve internal links and annotations, pdfrw might be better for the task than pyPDF2: 如果您只想能够反转页面进行打印,而又不想保留内部链接和注释,那么pdfrw可能比pyPDF2更好:

from  pdfrw import PdfWriter, PdfReader

iname = input("What's the file called? ")
oname = iname.rsplit('.', 1)[0] + '-reversed.pdf'

output = PdfWriter()
output.addpages(reversed(PdfReader(iname).pages))
output.write(oname)

Disclaimer: I am the primary pdfrw author. 免责声明:我是pdfrw的主要作者。

I would recommend that you use, 'merge' functionality of PyPDF2 instead of 'addPage'. 我建议您使用PyPDF2的“合并”功能代替“ addPage”。

Following code snippets elaborates how you can append and merge files/pages: 以下代码段详细说明了如何添加和合并文件/页面:

from PyPDF2 import PdfFileMerger

merger = PdfFileMerger()

input1 = open("file1.pdf", "rb")
input2 = open("file2.pdf", "rb")


# add the first 3 pages of first file to output
merger.append(fileobj = input1, pages = (0,3))

# insert the first page of second file into the output beginning after the second page
merger.merge(position = 2, fileobj = input2, pages = (0,1))

# Write to an output PDF document
output = open("document-output.pdf", "wb")
merger.write(output)

Remove the 'pages' argument in 'append' and 'merge' functions to merge files instead of specific pages. 删除“ append”和“ merge”功能中的“ pages”参数以合并文件而不是特定页面。

Try uninstall and install again the library PyPDF2.尝试卸载并再次安装库 PyPDF2。 It has worked for me!它对我有用!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM