java itext捕获null异常pdf文本提取

Question

When extracting text form pdf using itext 5.3.4 using this code: 使用以下代码使用itext 5.3.4提取文本格式pdf时：

try {
    reader = new PdfReader(thepdffilename);
} catch (IOException e) {
    openok=false;
}

if (openok==true){
    int numberOfPages = reader.getNumberOfPages();
    PdfReaderContentParser parser = new PdfReaderContentParser(reader);
    for (int page = 1; page <= numberOfPages; page++){
        try {
              SimpleTextExtractionStrategy strategy = parser.processContent(page, new SimpleTextExtractionStrategy());              
            content = content + strategy.getResultantText();
        } catch (Throwable t) { 
            crap=true;
            break;
        }
    }
    reader.close();
}

However occasionally GooglePlay crashes & ANRs reports that there has been a NP exception in itext. 但是，偶尔GooglePlay崩溃并且ANR报告在itext中存在NP异常。

java.lang.NullPointerException in com.itextpdf.text.pdf.PdfReader$PageRefs.readPages at 
com.itextpdf.text.pdf.PdfReader$PageRefs.readPages(PdfReader.java:3382) at 
com.itextpdf.text.pdf.PdfReader$PageRefs.<init>(PdfReader.java:3350) at com.itextpdf.text.pdf.PdfReader$PageRefs.<init>(PdfReader.java:3328) at 
com.itextpdf.text.pdf.PdfReader.readPages(PdfReader.java:1003) at com.itextpdf.text.pdf.PdfReader.readPdf(PdfReader.java:530) at 
com.itextpdf.text.pdf.PdfReader.<init>(PdfReader.java:170) at 
com.itextpdf.text.pdf.PdfReader.<init>(PdfReader.java:159)

The 5.3.4 source code at line 3382 is: 第3382行的5.3.4源代码为：

http://grepcode.com/file/repo1.maven.org/maven2/com.itextpdf/itextpdf/5.3.4/com/itextpdf/text/pdf/PdfReader.java?av=f http://grepcode.com/file/repo1.maven.org/maven2/com.itextpdf/itextpdf/5.3.4/com/itextpdf/text/pdf/PdfReader.java?av=f

3374    void  readPages() throws IOException {
3375      if (refsn != null)
3376           return;
3377        refsp = null;
3378            refsn = new ArrayList<PRIndirectReference>();
3379            pageInh = new ArrayList<PdfDictionary>();
3380            iteratePages((PRIndirectReference)reader.catalog.get(PdfName.PAGES));
3381            pageInh = null;
3382            reader.rootPages.put(PdfName.COUNT, new PdfNumber(refsn.size()));
3383        }
3384
3385    void  reReadPages() throws IOException {
3386            refsn = null;
3387            readPages();
3388    }

So something is going wrong when certain pdf files are having their text extracted and the reason why that could be happening is probably never going to be sorted as I do not have the pdfs in question. 因此，当某些pdf文件被提取文本时出了问题，而这种情况可能发生的原因可能永远都不会被排序，因为我没有相关的pdf文件。

What I require is a method of catching the NP exception so my app does not crash. 我需要的是一种捕获NP异常的方法，以便我的应用程序不会崩溃。

I've tried 我试过了

} catch (Exception e) {

and as a last resort to try and catch any exception 并且作为尝试捕获任何异常的最后手段

} catch (Throwable t) {

Does anyone have an idea how I can get this particular itext error to be caught? 有谁知道如何获取这个特定的itext错误？

thanks 谢谢

Answer 1

If I understand you correctly, your attempts to catch that NPE have been made in your loop through the document pages: 如果我对您的理解正确，那么您在文档页面的循环中已经尝试过捕获该NPE：

for (int page = 1; page <= numberOfPages; page++){
    try {
        SimpleTextExtractionStrategy strategy =
            parser.processContent(page, new SimpleTextExtractionStrategy());              
        content = content + strategy.getResultantText();
    } catch (Throwable t) { 
        crap=true;
        break;
    }
}

If you look closely at your Exception, though... 不过，如果您仔细查看您的例外情况，...

java.lang.NullPointerException in com.itextpdf.text.pdf.PdfReader$PageRefs.readPages at 
com.itextpdf.text.pdf.PdfReader$PageRefs.readPages(PdfReader.java:3382) at 
[...]
com.itextpdf.text.pdf.PdfReader.<init>(PdfReader.java:159)

you see that the exception already occurs in the PdfReader construction ( PdfReader.<init> ). 您会看到PdfReader构造（ PdfReader.<init> ）中已经发生了异常。 Thus, you have to catch the NPE already where you construct your PdfReader: 因此，您必须在构造PdfReader的地方已经捕获了NPE：

try {
    reader = new PdfReader(thepdffilename);
} catch (IOException e) {
    openok=false;
} catch (NullPointerException npe) { // !!
    openok=false;                    // !!
}

Or if you want to take no chances 或者，如果您不想冒险

try {
    reader = new PdfReader(thepdffilename);
} catch (Throwable t) {              // !!
    openok=false;
}

If you have other code locations, too, in which a PdfReader is constructed, you may want to harden them, too. 如果您PdfReader构建PdfReader其他代码位置，也可能希望对其进行加固。

@BrunoLowagie This NPE had better be transformed to a tagged exeption, hadn't it? @BrunoLowagie最好将这种NPE转化为带标签的肽段，不是吗？

Answer 2

这很丑陋，但是如果您真的想捕获它，请尝试捕获RuntimeException

java itext捕获null异常pdf文本提取

问题描述

2 个解决方案

解决方案1
3 已采纳 2013-02-28 11:57:40

解决方案2
0 2013-02-28 11:30:32

java itext捕获null异常pdf文本提取

问题描述

2 个解决方案

解决方案1 3 已采纳 2013-02-28 11:57:40

解决方案2 0 2013-02-28 11:30:32

解决方案1
3 已采纳 2013-02-28 11:57:40

解决方案2
0 2013-02-28 11:30:32