I want to extract text from pdf file. For this I use pdfbox. At first I add the following dependency:
<dependencies>
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox</artifactId>
<version>2.0.4</version>
</dependency>
</dependencies>
So, here my code to extract text from pdf:
import org.apache.pdfbox.cos.COSDocument;
import org.apache.pdfbox.pdfparser.PDFParser;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.text.PDFTextStripper;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
public class Main {
public static void main(String[] args) {
PDFTextStripper pdfStripper = null;
PDDocument pdDoc = null;
COSDocument cosDoc = null;
File file = new File("C:/Users/Ann/Desktop/example.pdf");
try {
PDFParser parser = new PDFParser(new FileInputStream(file)); // in this line i get error
parser.parse();
cosDoc = parser.getDocument();
pdfStripper = new PDFTextStripper();
pdDoc = new PDDocument(cosDoc);
pdfStripper.setStartPage(1);
pdfStripper.setEndPage(5);
String parsedText = pdfStripper.getText(pdDoc);
System.out.println(parsedText);
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
But I get error: Error:(22, 46) java: incompatible types: java.io.FileInputStream cannot be converted to org.apache.pdfbox.io.RandomAccessRead.
Please, help me solve this problem.
Try to use the following code:
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.text.PDFTextStripper;
import java.io.File;
import java.io.IOException;
public class Main {
public static void main(String[] args) throws IOException {
File file = new File("D:/example.pdf");
PDDocument document = PDDocument.load(file);
PDFTextStripper pdfTextStripper = new PDFTextStripper();
pdfTextStripper.setStartPage(1);
pdfTextStripper.setEndPage(5);
String text = pdfTextStripper.getText(document);
System.out.println(text);
document.close();
}
}
尝试使用以下内容:
PDFParser parser =new PDFParser(new org.apache.pdfbox.io.RandomAccessFile(file, "r"));
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.