使用Perl API pdf :: api2从pdf读取文本时出错

Question

This is the code to read text of a pdf using perl 这是使用perl读取pdf文本的代码

#!/usr/bin/perl

use PDF::API2;

    $pdf = PDF::API2->new;
    $pdf = PDF::API2->open('01443325.pdf');
    $page = $pdf->page;
    $pagenum=10;
    $pdf->stringify;

    $page = $pdf->openpage($pagenum);

    print $page;

I dont get any output when i Run this code . 运行此代码时，我没有任何输出。 How to remove the error ? 如何清除错误？

Answer 1

When you run $pdf->stringify above, it returns the content of the file as a string, but then you don't do anything with it. 当您在上面运行$ pdf-> stringify时，它以字符串形式返回文件的内容，但是您对此不做任何事情。 If you were to print it, though, it would not give you the text representation you are after as it is simply the original PDF bytes in a string. 但是，如果要打印它，它将不会为您提供所需的文本表示形式，因为它只是字符串中的原始PDF字节。

Likewise, setting $pagenum to 10 has no consequences for the rest of the program as the variable is not linked to either the $pdf or $page object in any way. 同样，将$ pagenum设置为10对程序的其余部分也没有影响，因为该变量未以任何方式链接到$ pdf或$ page对象。

I think the easiest option is to not try to do this with PDF::API2, but to look at whether you can run something like pdftotext from xpdf or poppler first and then read in the output. 我认为最简单的选择是不要尝试使用PDF :: API2执行此操作，而是先查看是否可以先从xpdf或poppler运行pdftotext之类的东西，然后再读取输出。

If not, then there are some suggestions on the Perl Monks page http://www.perlmonks.org/?node_id=810721 , and many more on Google under "perl extract text from pdf". 如果不是这样，那么在Perl Monks页面http://www.perlmonks.org/?node_id=810721上会有一些建议，而在Google上的“ perl从pdf提取文本”下还有更多建议。 There's even a previous SO question at How can I extract text from a PDF file in Perl? 如何在Perl中从PDF文件中提取文本，甚至还有以前的SO问题？ . 。

Good luck! 祝好运！

使用Perl API pdf :: api2从pdf读取文本时出错

问题描述

1 个解决方案

解决方案1
3 2010-11-03 21:09:33

使用Perl API pdf :: api2从pdf读取文本时出错

问题描述

1 个解决方案

解决方案1 3 2010-11-03 21:09:33

解决方案1
3 2010-11-03 21:09:33