[英]Passing argument to pdf2txt function
I'm trying to use PDFMiner to extract texts from PDF file. 我正在尝试使用PDFMiner从PDF文件提取文本。 I wanted to use script pdf2txt.py to run the sample example in
我想使用脚本pdf2txt.py在以下位置运行示例示例
http://www.unixuser.org/~euske/python/pdfminer/index.html http://www.unixuser.org/~euske/python/pdfminer/index.html
with this single line 这条线
pdf2txt.py samples/simple1.pdf
Since I'm working on Windows with IDLE then I run the following scripts within IDLE 由于我正在使用IDLE在Windows上工作,因此我在IDLE中运行以下脚本
import pdf2txt
pdf2txt.main(['C:\Users\Desktop\Dictionary Construction\simple1.pdf'])
Each time it gave me 每次给我
usage: C:\\Usersernor\\Desktop\\Dictionary Construction\\simple1.pdf [-d] [-p pagenos] [-m maxpages] [-P password] [-o output] [-C] [-n] [-A] [-V] [-M char_margin] [-L line_margin] [-W word_margin] [-F boxes_flow] [-Y layout_mode] [-O output_dir] [-R rotation] [-t text|html|xml|tag] [-c codec] [-s scale] file ... 用法:C:\\ Usersernor \\ Desktop \\ Dictionary Construction \\ simple1.pdf [-d] [-p pagenos] [-m maxpages] [-P密码] [-o输出] [-C] [-n] [-A ] [-V] [-M char_margin] [-L line_margin] [-W word_margin] [-Fboxs_flow] [-Y layout_mode] [-O output_dir] [-R旋转] [-t text | html | xml | tag ] [-c编解码器] [-s比例]文件...
I know it's an error message telling me that the argument was not parsed. 我知道这是一条错误消息,告诉我该参数未解析。 The first couple of lines of pdf2txt.py is as follows:
pdf2txt.py的前几行如下:
def main(argv):
import getopt
def usage():
print ('usage: %s [-d] [-p pagenos] [-m maxpages] [-P password] [-o output]'
' [-C] [-n] [-A] [-V] [-M char_margin] [-L line_margin] [-W word_margin]'
' [-F boxes_flow] [-Y layout_mode] [-O output_dir] [-R rotation]'
' [-t text|html|xml|tag] [-c codec] [-s scale]'
' file ...' % argv[0])
return 100
try:
(opts, args) = getopt.getopt(argv[1:], 'dp:m:P:o:CnAVM:L:W:F:Y:O:R:t:c:s:')
except getopt.GetoptError:
How can I format my argument to make it? 如何格式化我的论点呢? I know it's a dumb question, but it drives me nutd.
我知道这是一个愚蠢的问题,但它使我感到困惑。
Please help me! 请帮我!
Thanks, 谢谢,
Jason 杰森
Updates 更新
Following Luis's advice, I changed the command to 遵循路易斯的建议,我将命令更改为
pdf2txt.main(['simple1.html','mypdf.pdf'])
Now it can produce the output in the shell window, however, I cannot find the output file 'simple1.html', I tried the following command: 现在它可以在shell窗口中产生输出了,但是,我找不到输出文件'simple1.html',我尝试了以下命令:
pdf2txt.main(['-o C:\Users\Desktop\Dictionary Construction\simple1.html','mypdf.pdf'])
pdf2txt.main(['C:\Users\Desktop\Dictionary Construction\simple1.html','mypdf.pdf'])
None of them worked and produced files in the folder I designated. 他们都不在我指定的文件夹中工作并产生文件。
You should call it as: 您应该将其称为:
pdf2txt.py samples/simple1.txt samples/simple1.pdf
If you want, let's say, samples/simple1.txt to be the output. 假设,如果要输出样本/simple1.txt。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.