简体   繁体   English

将所有PDF文件转换为目录中的文本

[英]Convert all PDF files into text in a directory

I just downloaded PDFMiner to convert PDF files to text. 我刚刚下载了PDFMiner ,将PDF文件转换为文本。 I convert files by executing this command on my terminal 我通过在终端上执行此命令来转换文件

python pdf2txt.py -o myOutput.txt simple1.pdf

It works fine, now I want to embed that function on my simple Python script. 它工作正常,现在我想将该函数嵌入我的简单Python脚本中。 I would like to convert all PDF files on a directory 我想转换目录中的所有PDF文件

# Lets say I have an array with filenames on it
files = [
    'file1.pdf', 'file2.pdf', 'file3.pdf'
]

# And convert all PDF files to text
# By repeatedly executing pdf2txt.py
for x in range(0, len(files))
    # And run something like
    python pdf2txt.py -o output.txt files[x]

I also tried using os.system but a blinking window appeared (my terminal). 我也尝试使用os.system但是出现了一个闪烁的窗口(我的终端)。 I just wanted to convert all the files on my array to texts. 我只想将数组上的所有文件都转换为文本。

Use the subprocess module. 使用subprocess模块。

import subprocess

files = [
    'file1.pdf', 'file2.pdf', 'file3.pdf'
]
for f in files:
    cmd = 'python pdf2txt.py -o %s.txt %s' % (f.split('.')[0], f)
    run = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    out, err = run.communicate()

    # display errors if they occur    
    if err:
        print err

Read the subprocess documentation for more information. 阅读子过程文档以获取更多信息。

There is an API to help you performing such tasks. 有一个API可帮助您执行此类任务。 Read the documentation . 阅读文档

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM