简体   繁体   中英

pyPDF2 "Stream has ended unexpectedly"

This is my first python code. The writer passes an error. This seems to occur randomly during the process of looping through the pdf's.

try: except: pass will not work because it will just skip the file with the issue and not produce an output for it.

strict=False does not seem to work for the writer.

The error:

PdfReadWarning: Multiple definitions in dictionary at byte 0x6eb54 for key /PageMode [generic.py:587]
PdfReadWarning: Multiple definitions in dictionary at byte 0x75740 for key /PageMode [generic.py:587]
PdfReadWarning: Multiple definitions in dictionary at byte 0xabc13 for key /PageMode [generic.py:587]
Traceback (most recent call last):
  File "C:\Users\kmincey.BCSBLOCAL\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\kmincey.BCSBLOCAL\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "c:\Users\kmincey.BCSBLOCAL\.vscode\extensions\ms-python.python-2022.4.0\pythonFiles\lib\python\debugpy\__main__.py", line 45, in <module>
    cli.main()
  File "c:\Users\kmincey.BCSBLOCAL\.vscode\extensions\ms-python.python-2022.4.0\pythonFiles\lib\python\debugpy/..\debugpy\server\cli.py", line 444, in main
    run()
  File "c:\Users\kmincey.BCSBLOCAL\.vscode\extensions\ms-python.python-2022.4.0\pythonFiles\lib\python\debugpy/..\debugpy\server\cli.py", line 285, in run_file
    runpy.run_path(target_as_str, run_name=compat.force_str("__main__"))
  File "C:\Users\kmincey.BCSBLOCAL\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 268, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "C:\Users\kmincey.BCSBLOCAL\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "C:\Users\kmincey.BCSBLOCAL\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "c:\Users\kmincey.BCSBLOCAL\Desktop\Python_scripts\PDFsealer_V2.py", line 56, in <module>
    output_pdf.write(f)
  File "C:\Users\kmincey.BCSBLOCAL\AppData\Local\Programs\Python\Python39\lib\site-packages\PyPDF2\pdf.py", line 482, in write
    self._sweepIndirectReferences(externalReferenceMap, self._root)
  File "C:\Users\kmincey.BCSBLOCAL\AppData\Local\Programs\Python\Python39\lib\site-packages\PyPDF2\pdf.py", line 571, in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)
  File "C:\Users\kmincey.BCSBLOCAL\AppData\Local\Programs\Python\Python39\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "C:\Users\kmincey.BCSBLOCAL\AppData\Local\Programs\Python\Python39\lib\site-packages\PyPDF2\pdf.py", line 571, in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)
  File "C:\Users\kmincey.BCSBLOCAL\AppData\Local\Programs\Python\Python39\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "C:\Users\kmincey.BCSBLOCAL\AppData\Local\Programs\Python\Python39\lib\site-packages\PyPDF2\pdf.py", line 556, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, data[i])
  File "C:\Users\kmincey.BCSBLOCAL\AppData\Local\Programs\Python\Python39\lib\site-packages\PyPDF2\pdf.py", line 571, in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)
  File "C:\Users\kmincey.BCSBLOCAL\AppData\Local\Programs\Python\Python39\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "C:\Users\kmincey.BCSBLOCAL\AppData\Local\Programs\Python\Python39\lib\site-packages\PyPDF2\pdf.py", line 556, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, data[i])
  File "C:\Users\kmincey.BCSBLOCAL\AppData\Local\Programs\Python\Python39\lib\site-packages\PyPDF2\pdf.py", line 577, in _sweepIndirectReferences
    newobj = data.pdf.getObject(data)
  File "C:\Users\kmincey.BCSBLOCAL\AppData\Local\Programs\Python\Python39\lib\site-packages\PyPDF2\pdf.py", line 1611, in getObject
    retval = readObject(self.stream, self)
  File "C:\Users\kmincey.BCSBLOCAL\AppData\Local\Programs\Python\Python39\lib\site-packages\PyPDF2\generic.py", line 66, in readObject
    return DictionaryObject.readFromStream(stream, pdf)
  File "C:\Users\kmincey.BCSBLOCAL\AppData\Local\Programs\Python\Python39\lib\site-packages\PyPDF2\generic.py", line 579, in readFromStream
    value = readObject(stream, pdf)
  File "C:\Users\kmincey.BCSBLOCAL\AppData\Local\Programs\Python\Python39\lib\site-packages\PyPDF2\generic.py", line 68, in readObject
    return readHexStringFromStream(stream)
  File "C:\Users\kmincey.BCSBLOCAL\AppData\Local\Programs\Python\Python39\lib\site-packages\PyPDF2\generic.py", line 311, in readHexStringFromStream
    raise PdfStreamError("Stream has ended unexpectedly")
PyPDF2.utils.PdfStreamError: Stream has ended unexpectedly

I have read several post regarding the issue of needing to put strict=False in the reader to pass warnings and not errors. https://stackoverflow.com/questions/42570432/pypdf2-stream-has-ended-unexpectedly , https://github.com/mstamy2/PyPDF2/issues/99 . This worked in most cases however, the writer now seems to be the problem.

Thanks in advance for any advice.

For loop snippet for reference:

for file in input_pdf:
    output_pdf = PdfFileWriter()
    sg.OneLineProgressMeter('My Meter', i, page_count, 'And now we Wait.....')
    PageObj = PyPDF2.PdfFileReader(open(file, "rb"), strict=False).getPage(0)
    PageObj.scaleTo(11*72, 17*72)
    PageObj.mergePage(Seal_pdf.getPage(0))
    output_pdf.addPage(PageObj)

    output_filename = f"{file}"
    f = open(output_filename, "wb+")
    output_pdf.write(f)
    i = i + 1
    f.close()

Due to the helpful input from @cards and @KJ, I was able to discover that the problem was my attempting to overwrite an in use file. The fact that the original was still tied up in memory would corrupt it once reaching the writer. Simply saving the file under a different name and writing some more code to clean up the directory was the solution I went with. Thanks for the assist.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM