将所有PDF文件转换为目录中的文本

Question

I just downloaded PDFMiner to convert PDF files to text. 我刚刚下载了PDFMiner ，将PDF文件转换为文本。 I convert files by executing this command on my terminal 我通过在终端上执行此命令来转换文件

python pdf2txt.py -o myOutput.txt simple1.pdf

It works fine, now I want to embed that function on my simple Python script. 它工作正常，现在我想将该函数嵌入我的简单Python脚本中。 I would like to convert all PDF files on a directory 我想转换目录中的所有PDF文件

# Lets say I have an array with filenames on it
files = [
    'file1.pdf', 'file2.pdf', 'file3.pdf'
]

# And convert all PDF files to text
# By repeatedly executing pdf2txt.py
for x in range(0, len(files))
    # And run something like
    python pdf2txt.py -o output.txt files[x]

I also tried using os.system but a blinking window appeared (my terminal). 我也尝试使用os.system但是出现了一个闪烁的窗口（我的终端）。 I just wanted to convert all the files on my array to texts. 我只想将数组上的所有文件都转换为文本。

Answer 1

Use the subprocess module. 使用subprocess模块。

import subprocess

files = [
    'file1.pdf', 'file2.pdf', 'file3.pdf'
]
for f in files:
    cmd = 'python pdf2txt.py -o %s.txt %s' % (f.split('.')[0], f)
    run = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    out, err = run.communicate()

    # display errors if they occur    
    if err:
        print err

Read the subprocess documentation for more information. 阅读子过程文档以获取更多信息。

Answer 2

There is an API to help you performing such tasks. 有一个API可帮助您执行此类任务。 Read the documentation . 阅读文档。

将所有PDF文件转换为目录中的文本

问题描述

2 个解决方案

解决方案1
1 2013-05-11 09:47:03

解决方案2
0 2013-05-11 12:43:45

将所有PDF文件转换为目录中的文本

问题描述

2 个解决方案

解决方案1 1 2013-05-11 09:47:03

解决方案2 0 2013-05-11 12:43:45

解决方案1
1 2013-05-11 09:47:03

解决方案2
0 2013-05-11 12:43:45