
[英]pdfminer - ImportError: No module named pdfminer.pdfdocument
[英]PDF to text Python 3.6 pdfminer no module named 'pdfminer'
我尝试使用pdfminer.six使用python 3.6.3将目录中的多个pdf转换为多个.txt文件
我收到以下错误: ModuleNotFoundError:运行以下代码时, 没有名为“ pdfminer”的模块 。 或者,当我运行pdf2txt.py filename.pdf时,它给出了以下环境信息:python \\ r:没有这样的文件或目录
我对此问题进行了一些研究。 我已经从点子中删除了原始的pdfminer,目前,我的点子中只有pdfminer.six。 另外,我在virtualenv上运行python 3.6.3。
下面是我运行的代码:
from io import StringIO
from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
from pdfminer.converter import TextConverter
from pdfminer.layout import LAParams
from pdfminer.pdfpage import PDFPage
import os
import sys, getopt
#converts pdf, returns its text content as a string
def convert(fname, pages=None):
if not pages:
pagenums = set()
else:
pagenums = set(pages)
output = StringIO()
manager = PDFResourceManager()
converter = TextConverter(manager, output, laparams=LAParams())
interpreter = PDFPageInterpreter(manager, converter)
infile = file(fname, 'rb')
for page in PDFPage.get_pages(infile, pagenums):
interpreter.process_page(page)
infile.close()
converter.close()
text = output.getvalue()
output.close
return text
#converts all pdfs in directory pdfDir, saves all resulting txt files to txtdir
def convertMultiple(pdfDir, txtDir):
if pdfDir == "": pdfDir = os.getcwd() + "\\" #if no pdfDir passed in
for pdf in os.listdir(pdfDir): #iterate through pdfs in pdf directory
fileExtension = pdf.split(".")[-1]
if fileExtension == "pdf":
pdfFilename = pdfDir + pdf
text = convert(pdfFilename) #get string of text content of pdf
textFilename = txtDir + pdf + ".txt"
textFile = open(textFilename, "w") #make text file
textFile.write(text) #write text to text file
pdfDir = "../../data/raw/"
txtDir = "../../data/interim/"
convertMultiple(pdfDir, txtDir)
安装适用于Python 3.X的pdfminer3k软件包
下载pdfminer3k tar.gz拆包运行python setup.py install
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.