简体   繁体   English

PDFBox org.apache.pdfbox.cos.COSInteger 无法转换为 org.apache.pdfbox.cos.COSDictionary

[英]PDFBox org.apache.pdfbox.cos.COSInteger cannot be cast to org.apache.pdfbox.cos.COSDictionary

Using PDFBox 2.0.25, process document to get signature dictionaries, example pdf使用PDFBox 2.0.25,处理文档以获取签名字典,例如pdf

try{
    doc = PDDocument.load(inputFile);
    doc.getSignatureDictionaries()
}catch(Exception e)
{
    e.printStackTrace();
}

document generated by scanned, producer:扫描生成的文件,生产者:

Foxit PhantomPDF Printer Version 6.1.0.0923

warn message in line doc = PDDocument.load(inputFile); doc = PDDocument.load(inputFile);行中的警告消息

Object (140:0) at offset 4039608 does not end with 'endobj' but with '0'

then get error in line doc.getSignatureDictionaries();然后在doc.getSignatureDictionaries();

java.lang.ClassCastException: org.apache.pdfbox.cos.COSInteger cannot be cast to org.apache.pdfbox.cos.COSDictionary
        at org.apache.pdfbox.pdmodel.interactive.form.PDAcroForm.getFields(PDAcroForm.java:378)
        at org.apache.pdfbox.pdmodel.interactive.form.PDFieldTree$FieldIterator.<init>(PDFieldTree.java:79)
        at org.apache.pdfbox.pdmodel.interactive.form.PDFieldTree$FieldIterator.<init>(PDFieldTree.java:68)
        at org.apache.pdfbox.pdmodel.interactive.form.PDFieldTree.iterator(PDFieldTree.java:62)
        at org.apache.pdfbox.pdmodel.PDDocument.getSignatureFields(PDDocument.java:932)
        at org.apache.pdfbox.pdmodel.PDDocument.getSignatureDictionaries(PDDocument.java:952)

why this is happening?为什么会这样? can a file like this be handled?可以处理这样的文件吗?

*updated: I have tried by replacing from maven repo Apache PDFBox » 2.0.25 to Apache PDFBox » 2.0.26, still getting the same error *更新:我已经尝试从 maven repo Apache PDFBox » 2.0.25 替换为 ZE9713AE04A02A810D6F33 错误DD956F42794Z PDFBox » 2.0。

The underlying issue is that there is an error in the object stream in the PDF.根本问题是PDF中的object stream有错误。

According to the PDF specification ISO 32000 (both part 1 and 2), section 7.5.7 – Object Streams –根据 PDF 规范 ISO 32000(第 1 部分和第 2 部分),第 7.5.7 节 – Object 流 –

An object in an object stream shall not consist solely of an object reference. object stream 中的 object 不应仅由 ZA8CFDE6331BD59EB2AC96F8911C4B666 参考组成。

But the example document shared by @blinkbink does have such objects in object stream, in particular 113 0 R for object 140, 141 0 R for object 157 and 179 0 R for object 191. But the example document shared by @blinkbink does have such objects in object stream, in particular 113 0 R for object 140, 141 0 R for object 157 and 179 0 R for object 191.

As these object references are forbidden in object streams, many PDF processors parse these references as the only other type of object that starts with an integer, as a number object. As these object references are forbidden in object streams, many PDF processors parse these references as the only other type of object that starts with an integer, as a number object. For example, the object 140 is parsed as the number 113 , not as a reference to object 113 (which happens to be a form field object).例如, object 140 被解析为数字113 ,而不是对 object 113 的引用(它恰好是一个表单字段对象)。

As a consequence, these PDF processors in the example document find number objects in an array which should only hold form field objects.因此,示例文档中的这些 PDF 处理器在数组中查找数字对象,该数组只应包含表单字段对象。 If form field reading of these processors is not programmed defensively, you get something like the ClassCastException observed here.如果这些处理器的表单字段读取没有进行防御性编程,您会得到类似于此处观察到的ClassCastException的内容。

Thus, while PDFBox used to not be defensively programmed here, the main issue is in the PDF producer that created the PDF at hand.因此,虽然 PDFBox 过去没有在这里进行防御性编程,但主要问题在于 PDF 生产者,该生产者创建了手头的 PDF。 An issue should be filed with them.应该向他们提出问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 java.lang.ClassCastException:org.apache.pdfbox.cos.COSArray - java.lang.ClassCastException: org.apache.pdfbox.cos.COSArray 错误:无法将org.apache.pdfbox.pdmodel.graphics.xobject.PDXObjectForm强制转换为org.apache.pdfbox.pdmodel.graphics.xobject.PDXObjectImage - Error: org.apache.pdfbox.pdmodel.graphics.xobject.PDXObjectForm cannot be cast to org.apache.pdfbox.pdmodel.graphics.xobject.PDXObjectImage 找不到mvn套件org.apache.pdfbox - mvn package org.apache.pdfbox not found 包 org.apache.pdfbox 不存在 - package org.apache.pdfbox does not exist ClassNotFoundException:org.apache.pdfbox.pdmodel.PDDocument - ClassNotFoundException: org.apache.pdfbox.pdmodel.PDDocument PDFBox错误:存在时出现“找不到org.apache.pdfbox.rendering.PDFRenderer” - PDFBox error: “org.apache.pdfbox.rendering.PDFRenderer not found” when it is present java.lang.ClassNotFoundException:org.apache.pdfbox.exceptions.COSVisitorException - java.lang.ClassNotFoundException: org.apache.pdfbox.exceptions.COSVisitorException 为什么我不能导入org.apache.pdfbox.util。*? - Why can't I import org.apache.pdfbox.util.*? org.apache.pdfbox.examples.signature.CreateVisableSignature引发null异常 - org.apache.pdfbox.examples.signature.CreateVisableSignature is throwing null exception 使用 Springboot org.apache.commons.logging 禁用 PDFBox 日志记录 - Disable PDFBox logging with Springboot org.apache.commons.logging
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM