Error when extract text from pdf file (java + pdfbox)

Question

I want to extract text from pdf file. For this I use pdfbox. At first I add the following dependency:

<dependencies>
        <dependency>
            <groupId>org.apache.pdfbox</groupId>
            <artifactId>pdfbox</artifactId>
            <version>2.0.4</version>
        </dependency>

    </dependencies>

So, here my code to extract text from pdf:

import org.apache.pdfbox.cos.COSDocument;
import org.apache.pdfbox.pdfparser.PDFParser;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.text.PDFTextStripper;

import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;

public class Main {

    public static void main(String[] args) {
        PDFTextStripper pdfStripper = null;
        PDDocument pdDoc = null;
        COSDocument cosDoc = null;
        File file = new File("C:/Users/Ann/Desktop/example.pdf");
        try {


            PDFParser parser = new PDFParser(new FileInputStream(file)); // in this line i get error
            parser.parse();
            cosDoc = parser.getDocument();
            pdfStripper = new PDFTextStripper();
            pdDoc = new PDDocument(cosDoc);
            pdfStripper.setStartPage(1);
            pdfStripper.setEndPage(5);
            String parsedText = pdfStripper.getText(pdDoc);
            System.out.println(parsedText);
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

    }

}

But I get error: Error:(22, 46) java: incompatible types: java.io.FileInputStream cannot be converted to org.apache.pdfbox.io.RandomAccessRead.

Please, help me solve this problem.

Answer 1

Try to use the following code:

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.text.PDFTextStripper;
import java.io.File;
import java.io.IOException;
public class Main {
    public static void main(String[] args) throws IOException {
        File file = new File("D:/example.pdf");
        PDDocument document = PDDocument.load(file);
        PDFTextStripper pdfTextStripper = new PDFTextStripper();
        pdfTextStripper.setStartPage(1);
        pdfTextStripper.setEndPage(5);
        String text  = pdfTextStripper.getText(document);
        System.out.println(text);
        document.close();
    }
}

Answer 2

尝试使用以下内容：

PDFParser parser =new PDFParser(new org.apache.pdfbox.io.RandomAccessFile(file, "r"));

Error when extract text from pdf file (java + pdfbox)

Question

2 answers

solution1
1 2018-04-01 18:53:15

solution2
0 2019-07-01 21:38:20

Error when extract text from pdf file (java + pdfbox)

Question

2 answers

solution1 1 2018-04-01 18:53:15

solution2 0 2019-07-01 21:38:20

solution1
1 2018-04-01 18:53:15

solution2
0 2019-07-01 21:38:20