[英]Convert all PDF files into text in a directory
I just downloaded PDFMiner to convert PDF files to text. 我刚刚下载了PDFMiner ,将PDF文件转换为文本。 I convert files by executing this command on my terminal
我通过在终端上执行此命令来转换文件
python pdf2txt.py -o myOutput.txt simple1.pdf
It works fine, now I want to embed that function on my simple Python script. 它工作正常,现在我想将该函数嵌入我的简单Python脚本中。 I would like to convert all PDF files on a directory
我想转换目录中的所有PDF文件
# Lets say I have an array with filenames on it
files = [
'file1.pdf', 'file2.pdf', 'file3.pdf'
]
# And convert all PDF files to text
# By repeatedly executing pdf2txt.py
for x in range(0, len(files))
# And run something like
python pdf2txt.py -o output.txt files[x]
I also tried using os.system
but a blinking window appeared (my terminal). 我也尝试使用
os.system
但是出现了一个闪烁的窗口(我的终端)。 I just wanted to convert all the files on my array to texts. 我只想将数组上的所有文件都转换为文本。
Use the subprocess
module. 使用
subprocess
模块。
import subprocess
files = [
'file1.pdf', 'file2.pdf', 'file3.pdf'
]
for f in files:
cmd = 'python pdf2txt.py -o %s.txt %s' % (f.split('.')[0], f)
run = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
out, err = run.communicate()
# display errors if they occur
if err:
print err
Read the subprocess documentation for more information. 阅读子过程文档以获取更多信息。
There is an API to help you performing such tasks. 有一个API可帮助您执行此类任务。 Read the documentation .
阅读文档 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.