[英]java itext catching null exception pdf text extraction
When extracting text form pdf using itext 5.3.4 using this code: 使用以下代码使用itext 5.3.4提取文本格式pdf时:
try {
reader = new PdfReader(thepdffilename);
} catch (IOException e) {
openok=false;
}
if (openok==true){
int numberOfPages = reader.getNumberOfPages();
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
for (int page = 1; page <= numberOfPages; page++){
try {
SimpleTextExtractionStrategy strategy = parser.processContent(page, new SimpleTextExtractionStrategy());
content = content + strategy.getResultantText();
} catch (Throwable t) {
crap=true;
break;
}
}
reader.close();
}
However occasionally GooglePlay crashes & ANRs reports that there has been a NP exception in itext. 但是,偶尔GooglePlay崩溃并且ANR报告在itext中存在NP异常。
java.lang.NullPointerException in com.itextpdf.text.pdf.PdfReader$PageRefs.readPages at
com.itextpdf.text.pdf.PdfReader$PageRefs.readPages(PdfReader.java:3382) at
com.itextpdf.text.pdf.PdfReader$PageRefs.<init>(PdfReader.java:3350) at com.itextpdf.text.pdf.PdfReader$PageRefs.<init>(PdfReader.java:3328) at
com.itextpdf.text.pdf.PdfReader.readPages(PdfReader.java:1003) at com.itextpdf.text.pdf.PdfReader.readPdf(PdfReader.java:530) at
com.itextpdf.text.pdf.PdfReader.<init>(PdfReader.java:170) at
com.itextpdf.text.pdf.PdfReader.<init>(PdfReader.java:159)
The 5.3.4 source code at line 3382 is: 第3382行的5.3.4源代码为:
http://grepcode.com/file/repo1.maven.org/maven2/com.itextpdf/itextpdf/5.3.4/com/itextpdf/text/pdf/PdfReader.java?av=f http://grepcode.com/file/repo1.maven.org/maven2/com.itextpdf/itextpdf/5.3.4/com/itextpdf/text/pdf/PdfReader.java?av=f
3374 void readPages() throws IOException {
3375 if (refsn != null)
3376 return;
3377 refsp = null;
3378 refsn = new ArrayList<PRIndirectReference>();
3379 pageInh = new ArrayList<PdfDictionary>();
3380 iteratePages((PRIndirectReference)reader.catalog.get(PdfName.PAGES));
3381 pageInh = null;
3382 reader.rootPages.put(PdfName.COUNT, new PdfNumber(refsn.size()));
3383 }
3384
3385 void reReadPages() throws IOException {
3386 refsn = null;
3387 readPages();
3388 }
So something is going wrong when certain pdf files are having their text extracted and the reason why that could be happening is probably never going to be sorted as I do not have the pdfs in question. 因此,当某些pdf文件被提取文本时出了问题,而这种情况可能发生的原因可能永远都不会被排序,因为我没有相关的pdf文件。
What I require is a method of catching the NP exception so my app does not crash. 我需要的是一种捕获NP异常的方法,以便我的应用程序不会崩溃。
I've tried 我试过了
} catch (Exception e) {
and as a last resort to try and catch any exception 并且作为尝试捕获任何异常的最后手段
} catch (Throwable t) {
Does anyone have an idea how I can get this particular itext error to be caught? 有谁知道如何获取这个特定的itext错误?
thanks 谢谢
If I understand you correctly, your attempts to catch that NPE have been made in your loop through the document pages: 如果我对您的理解正确,那么您在文档页面的循环中已经尝试过捕获该NPE:
for (int page = 1; page <= numberOfPages; page++){
try {
SimpleTextExtractionStrategy strategy =
parser.processContent(page, new SimpleTextExtractionStrategy());
content = content + strategy.getResultantText();
} catch (Throwable t) {
crap=true;
break;
}
}
If you look closely at your Exception, though... 不过,如果您仔细查看您的例外情况,...
java.lang.NullPointerException in com.itextpdf.text.pdf.PdfReader$PageRefs.readPages at
com.itextpdf.text.pdf.PdfReader$PageRefs.readPages(PdfReader.java:3382) at
[...]
com.itextpdf.text.pdf.PdfReader.<init>(PdfReader.java:159)
you see that the exception already occurs in the PdfReader construction ( PdfReader.<init>
). 您会看到PdfReader构造( PdfReader.<init>
)中已经发生了异常。 Thus, you have to catch the NPE already where you construct your PdfReader: 因此,您必须在构造PdfReader的地方已经捕获了NPE:
try {
reader = new PdfReader(thepdffilename);
} catch (IOException e) {
openok=false;
} catch (NullPointerException npe) { // !!
openok=false; // !!
}
Or if you want to take no chances 或者,如果您不想冒险
try {
reader = new PdfReader(thepdffilename);
} catch (Throwable t) { // !!
openok=false;
}
If you have other code locations, too, in which a PdfReader
is constructed, you may want to harden them, too. 如果您PdfReader
构建PdfReader
其他代码位置,也可能希望对其进行加固。
@BrunoLowagie This NPE had better be transformed to a tagged exeption, hadn't it? @BrunoLowagie最好将这种NPE转化为带标签的肽段,不是吗?
这很丑陋,但是如果您真的想捕获它,请尝试捕获RuntimeException
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.