简体   繁体   English

PyPDF2挂起处理

[英]PyPDF2 hangs on processing

I'm processing multiple pdf files using PyPDF2 but my script hangs somewhere. 我正在使用PyPDF2处理多个pdf文件,但是我的脚本挂在了某个地方。 All I can see in my console is some "startxref on same line as offset" which I'm correct is a warning so by right it should still go to the finally block and return an empty string. 我在控制台中看到的只是一些“与偏移量在同一行上的startxref”,我是对的,这是一个警告,因此,正确的是,它仍应转到finally块并返回空字符串。

Am I doing something wrong? 难道我做错了什么?

import PyPDF2
import sys
import os
def decode_pdf(src_filename):           
    out_str=""
    try:
        f = open(str(src_filename), "rb")           
        read_pdf = PyPDF2.PdfFileReader(f)
        number_of_pages = read_pdf.getNumPages()
        for i in range(0,number_of_pages):
            page = read_pdf.getPage(i)
            out_str = out_str + " " + page.extractText()
        out_str = ''.join(out_str.splitlines())
        f.close()
    except:
        print("Exception on pdf")
        print(sys.exc_info())
        out_str = ""
    finally:
        return out_str

I was facing this issue too and couldn't solve it using PyPDF2. 我也面临这个问题,无法使用PyPDF2解决。 I solved mine with pdfminer using the example from here 我从这里使用示例用pdfminer解决了我的问题

Copying the relevant code here below 在下面复制相关代码

from cStringIO import StringIO
from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
from pdfminer.converter import TextConverter
from pdfminer.layout import LAParams
from pdfminer.pdfpage import PDFPage

def convert(fname, pages=None):
    if not pages:
        pagenums = set()
    else:
        pagenums = set(pages)

    output = StringIO()
    manager = PDFResourceManager()
    converter = TextConverter(manager, output, laparams=LAParams())
    interpreter = PDFPageInterpreter(manager, converter)

    infile = file(fname, 'rb')
    for page in PDFPage.get_pages(infile, pagenums):
        interpreter.process_page(page)
    infile.close()
    converter.close()
    text = output.getvalue()
    output.close
    return text 

call the function convert() as below 如下调用函数convert()

convert('myfile.pdf', pages=[5,7])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM