![](/img/trans.png)
[英]Issue using Apache tika parser when trying to parse pdf having text contains image
[英]PDF parser text contains
我想使用TestNG和PDFBox验证PDF文档。
我想问一下,PDF是否可以检查包含以下内容的文本:
PDFParser parser = new PDFParser(stream);
parser.getDocument().conntains("ABC")
试试下面的代码:
public void ReadPDF() throws Exception {
URL TestURL = new URL("http://www.axmag.com/download/pdfurl-guide.pdf");
BufferedInputStream TestFile = new BufferedInputStream(TestURL.openStream());
PDFParser TestPDF = new PDFParser(TestFile);
TestPDF.parse();
String TestText = new PDFTextStripper().getText(TestPDF.getPDDocument());
Assert.assertTrue(TestText.contains("Open the setting.xml, you can see it is like this"));
}
下载库:-https: //pdfbox.apache.org/index.html
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.