[英]Issue using Apache tika parser when trying to parse pdf having text contains image
[英]PDF parser text contains
我想使用TestNG和PDFBox驗證PDF文檔。
我想問一下,PDF是否可以檢查包含以下內容的文本:
PDFParser parser = new PDFParser(stream);
parser.getDocument().conntains("ABC")
試試下面的代碼:
public void ReadPDF() throws Exception {
URL TestURL = new URL("http://www.axmag.com/download/pdfurl-guide.pdf");
BufferedInputStream TestFile = new BufferedInputStream(TestURL.openStream());
PDFParser TestPDF = new PDFParser(TestFile);
TestPDF.parse();
String TestText = new PDFTextStripper().getText(TestPDF.getPDDocument());
Assert.assertTrue(TestText.contains("Open the setting.xml, you can see it is like this"));
}
下載庫:-https: //pdfbox.apache.org/index.html
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.