繁体   English   中英

PDF转文本Python 3.6 pdfminer没有名为'pdfminer'的模块

[英]PDF to text Python 3.6 pdfminer no module named 'pdfminer'

我尝试使用pdfminer.six使用python 3.6.3将目录中的多个pdf转换为多个.txt文件

我收到以下错误: ModuleNotFoundError:运行以下代码时, 没有名为“ pdfminer”的模块 或者,当我运行pdf2txt.py filename.pdf时,它给出了以下环境信息:python \\ r:没有这样的文件或目录

我对此问题进行了一些研究。 我已经从点子中删除了原始的pdfminer,目前,我的点子中只有pdfminer.six。 另外,我在virtualenv上运行python 3.6.3。

下面是我运行的代码:

from io import StringIO
from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
from pdfminer.converter import TextConverter
from pdfminer.layout import LAParams
from pdfminer.pdfpage import PDFPage
import os
import sys, getopt

#converts pdf, returns its text content as a string
def convert(fname, pages=None):
    if not pages:
        pagenums = set()
    else:
        pagenums = set(pages)

    output = StringIO()
    manager = PDFResourceManager()
    converter = TextConverter(manager, output, laparams=LAParams())
    interpreter = PDFPageInterpreter(manager, converter)

    infile = file(fname, 'rb')
    for page in PDFPage.get_pages(infile, pagenums):
        interpreter.process_page(page)
    infile.close()
    converter.close()
    text = output.getvalue()
    output.close
    return text 

#converts all pdfs in directory pdfDir, saves all resulting txt files to txtdir
def convertMultiple(pdfDir, txtDir):
    if pdfDir == "": pdfDir = os.getcwd() + "\\" #if no pdfDir passed in 
    for pdf in os.listdir(pdfDir): #iterate through pdfs in pdf directory
        fileExtension = pdf.split(".")[-1]
        if fileExtension == "pdf":
            pdfFilename = pdfDir + pdf 
            text = convert(pdfFilename) #get string of text content of pdf
            textFilename = txtDir + pdf + ".txt"
            textFile = open(textFilename, "w") #make text file
            textFile.write(text) #write text to text file

pdfDir = "../../data/raw/"
txtDir = "../../data/interim/"
convertMultiple(pdfDir, txtDir)

安装适用于Python 3.X的pdfminer3k软件包

下载pdfminer3k tar.gz拆包运行python setup.py install

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM