简体   繁体   English

使用iText将具有表单的PDF转换为仅具有文本的PDF(保留数据)

[英]Convert a PDF with forms to a PDF with text only (preserve data) using iText

  • I have multiple PDFs that get populated with multiple records (a.pdf,b.pdf,c[0-9].pdf,d[0-9].pdf,ez.pdf) using acroforms and pdfbox. 我有多个使用acroforms和pdfbox填充多个记录(a.pdf,b.pdf,c [0-9] .pdf,d [0-9] .pdf,ez.pdf)的PDF。
  • The resulting files (aflat.pdf,bflat.pdf,c[0-9]flat.pdf,d[0-9]flat.pdf,ezflat.pdf) should have their forms(dictionaries and whatever adobe uses) removed but the fields filled as raw text saved on the pdf (setReadOnly is not what I want!). 生成的文件(aflat.pdf,bflat.pdf,c [0-9] flat.pdf,d [0-9] flat.pdf,ezflat.pdf)应删除其格式(字典和任何Adobe使用的格式),但填充为原始文本的字段保存在pdf上(setReadOnly不是我想要的!)。

PdfStamper can only remove fields without saving their content but I've found some references to PdfContentByte as a way to save the content. PdfStamper只能删除字段而不保存其内容,但是我发现一些对PdfContentByte的引用可以用来保存内容。 Alas, the documentation is too brief to understand how I should do this. las,文档太简短,无法理解我应该怎么做。

As a last resort I could use FieldPosition to write directly on the PDF. 作为最后的选择,我可以使用FieldPosition直接在PDF上编写。 Has anyone ever encountered such problem? 有没有人遇到过这样的问题? How do I solve it? 我该如何解决?

UPDATE : Saving a single page of b.pdf yields a valid bfilled.pdf but a blank bflattened.pdf . 更新保存b.pdf的一页会产生有效的bfilled.pdf,但空白的bflattened.pdf Saving the whole document solved the issue. 保存整个文档解决了该问题。

    populateB();
    try (PDDocument doc = new PDDocument(); FileOutputStream stream = new FileOutputStream("bfilled.pdf")) {
        //importing the page will corrupt the fields
        /*wrong approach*/doc.importPage((PDPage)pdfDocuments.get(0).getDocumentCatalog().getAllPages().get(0));
        /*wrong approach*/doc.save(stream);
        //save the whole document instead
        pdfDocuments.get(0).save(stream);//<---right approach

    }
    try (FileOutputStream stream = new FileOutputStream("bflattened.pdf")) {
        PdfStamper stamper = new PdfStamper(new PdfReader("bfilled.pdf"), stream);
        stamper.setFormFlattening(true);
        stamper.close();
    }

使用PdfStamper.setFormFlattening(true)摆脱字段并将它们写为内容。

Always use the whole page when working with acroforms 使用acroform时始终使用整个页面

    populateB();
try (PDDocument doc = new PDDocument(); FileOutputStream stream = new FileOutputStream("bfilled.pdf")) {
    //importing the page will corrupt the fields
    doc.importPage((PDPage) pdfDocuments.get(0).getDocumentCatalog().getAllPages().get(0));
    doc.save(stream); 
    //save the whole document instead
    pdfDocuments.get(0).save(stream);

}
try (FileOutputStream stream = new FileOutputStream("bflattened.pdf")) {
    PdfStamper stamper = new PdfStamper(new PdfReader("bfilled.pdf"), stream);
    stamper.setFormFlattening(true);
    stamper.close();
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM