简体   繁体   中英

Error while reading text out of a pdf using perl api pdf::api2

This is the code to read text of a pdf using perl

#!/usr/bin/perl

use PDF::API2;

    $pdf = PDF::API2->new;
    $pdf = PDF::API2->open('01443325.pdf');
    $page = $pdf->page;
    $pagenum=10;
    $pdf->stringify;

    $page = $pdf->openpage($pagenum);

    print $page;

I dont get any output when i Run this code . How to remove the error ?

When you run $pdf->stringify above, it returns the content of the file as a string, but then you don't do anything with it. If you were to print it, though, it would not give you the text representation you are after as it is simply the original PDF bytes in a string.

Likewise, setting $pagenum to 10 has no consequences for the rest of the program as the variable is not linked to either the $pdf or $page object in any way.

I think the easiest option is to not try to do this with PDF::API2, but to look at whether you can run something like pdftotext from xpdf or poppler first and then read in the output.

If not, then there are some suggestions on the Perl Monks page http://www.perlmonks.org/?node_id=810721 , and many more on Google under "perl extract text from pdf". There's even a previous SO question at How can I extract text from a PDF file in Perl? .

Good luck!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM