简体   繁体   English

如何使用itext从文本文件中的数据填写(动态XFA)PDF

[英]How to use itext to fill out (dynamic XFA) PDF from data in a text file

I have a local PDF form that has a specific template that never changes. 我有一个本地PDF表单,该表单具有一个永不更改的特定模板。 I've identified the form as an XFA (xml) dynamic form since no keysets were returned. 由于没有返回键集,因此我将表单标识为XFA(xml)动态表单。 I'm trying to use itext to fill in the form with data contained in a .txt file. 我正在尝试使用itext用.txt文件中包含的数据填写表单。 From my understanding I need to somehow get the data from the text file and place it properly into a .xml file so that itext can manipulate the original PDF by using the given xml. 根据我的理解,我需要以某种方式从文本文件中获取数据并将其正确放置到.xml文件中,以便itext可以使用给定的xml来处理原始PDF。

The form has the following layout as an example: 表单具有以下布局作为示例:

例

The sample code I'm using in Eclipse compiles/runs successfully but it requires the data in the file data.xml in order to populate the empty form with field data and output the filled-in version. 我在Eclipse中使用的示例代码可以成功编译/运行,但需要使用data.xml中的数据,以便使用字段数据填充空表单并输出填充的版本。 The thing is, for my actual project I don't have a data.xml file to use in order to populate the form properly. 问题是,对于我的实际项目,我没有要正确填充表单的data.xml文件。 The raw field data is in a .txt file with each line containing data for a different field in the PDF. 原始字段数据在.txt文件中,每行包含PDF中不同字段的数据。

EXAMPLE: Referencing the image above, my .txt file looks like this for the fields up to and including the field labelled "FOUR": 示例:参考上面的图像,我的.txt文件看起来像这样,直到包含(包括)“ FOUR”的字段为止的字段:

  • John 约翰
  • 15 15
  • Black 黑色
  • Honda 本田
  • Toyota 丰田汽车
  • Ford 福特汽车
  • BMW 宝马

I'm confused about 2 things: 我对两件事感到困惑:

1. How do I extract the original PDF's xml structure so that I know the format to adhere to when populating it with data from the .txt file? 1.如何提取原始PDF的xml结构,以便在使用.txt文件中的数据填充时知道要遵循的格式?

2. How do I get the values from the text file and insert them into the .xml structure properly? 2.如何从文本文件中获取值并将其正确插入.xml结构中?

The following code works but requires data.xml in order to fill in "incomplete.pdf". 以下代码有效,但需要data.xml才能填写“ incomplete.pdf”。 It uses the code xfa.fillXfaForm(new FileInputStream(XML)); 它使用代码xfa.fillXfaForm(new FileInputStream(XML)); to input the data, but I'm stuck on how to identify the structure for "XML" and how to fill it in in the first place. 输入数据,但是我一直坚持如何识别“ XML”的结构以及如何首先填充它。

Any help is appreciated, thank you very much. 任何帮助表示赞赏,非常感谢。

Code: 码:

package sandbox;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;


import java.io.PrintStream;
import java.util.Set;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.TransformerFactoryConfigurationError;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;

import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;

import com.itextpdf.text.DocumentException;
import com.itextpdf.text.pdf.AcroFields;
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.PdfStamper;
import com.itextpdf.text.pdf.XfaForm;


public class FillXFA {

    public static final String SRC = "C:/Workspace/PDF/incomplete.pdf";
    public static final String XML = "C:/Workspace/PDF/data.xml";
    public static final String DEST = "C:/Workspace/PDF/completed.pdf";

    public static void main(String[] args) throws IOException, DocumentException {
        File file = new File(DEST);
        file.getParentFile().mkdirs();
        new FillXFA().manipulatePdf(SRC, DEST);
    }

    public void readXfa(String src, String dest)
            throws IOException, ParserConfigurationException, SAXException,
                TransformerFactoryConfigurationError, TransformerException {
            FileOutputStream os = new FileOutputStream(dest);
            PdfReader reader = new PdfReader(src);
            XfaForm xfa = new XfaForm(reader);
            Document doc = xfa.getDomDocument();
            Transformer tf = TransformerFactory.newInstance().newTransformer();
            tf.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
            tf.setOutputProperty(OutputKeys.INDENT, "yes");
            tf.transform(new DOMSource(doc), new StreamResult(os));
            reader.close();
        }

    public void manipulatePdf(String src, String dest)
        throws IOException, DocumentException {
        PdfReader reader = new PdfReader(src);
        PdfStamper stamper = new PdfStamper(reader,
                new FileOutputStream(dest));
        AcroFields form = stamper.getAcroFields();
        XfaForm xfa = form.getXfa();
        xfa.fillXfaForm(new FileInputStream(XML));
        stamper.close();
        reader.close();
    }
}

In XFA, the link between form fields and form data is made using a concept called data binding . 在XFA中,表单字段和表单数据之间的链接是使用称为数据绑定的概念进行的。 Fields can have an XPath-like expression to select their value from the XML data structure. 字段可以具有类似XPath的表达式,以从XML数据结构中选择其值。 This implies that the XML data needs to be suitably structured to work for a specific XFA form, but this structure is not necessarily unique. 这意味着需要对XML数据进行适当的结构化以使其适合特定的XFA格式,但是此结构不一定是唯一的。

A simple example: Suppose you have an XFA form with just 1 text field. 一个简单的示例:假设您有一个仅包含1个文本字段的XFA表单。 This text field has a data binding to any XML element with tag name "Name" . 该文本字段具有绑定到标签名称为“ Name”的任何XML元素的数据。 In this case you data.xml can simply be: 在这种情况下,您的data.xml可以简单地是:

<Name>Hurmle</Name>

But this, and an infinite number of different XML structures, will also work: 但是,这和无数种不同的XML结构也将起作用:

<StackOverflow>
    <accounts>
        <account>
            <Name>Hurmle</Name>
        </account>
    </accounts>
</StackOverflow>

The readXfa method in your code sample will work to extract the complete XML stream from the XFA form. 您的代码示例中的readXfa方法将用于从XFA表单中提取完整的XML流。 It consists of different parts. 它由不同部分组成。 The most relevant are: 最相关的是:

  • template : Describes the logical form structure, including all the fields and their data binding. template :描述逻辑表单结构,包括所有字段及其数据绑定。
  • xfa:datasets : Holds information about the data. xfa:datasets :保存有关数据的信息。 Consists of 2 parts. 由2部分组成。
    • dataDescription : A schema for the form data, optional. dataDescription :表单数据的模式,可选。 The data description grammar is defined in the XFA specification. 数据描述语法在XFA规范中定义。
    • xfa:data : The form data. xfa:data :表单数据。

One way to determine which XML structure will work, is to look at the data binding of all the fields (cf template ). 确定哪种XML结构将起作用的一种方法是查看所有字段的数据绑定(参见template )。 Thus you will know where the fields expect to get their data. 因此,您将知道这些字段期望从何处获取其数据。 For a non-trivial form, this can be complex and/or a lot of work. 对于非平凡的形式,这可能很复杂和/或需要大量工作。

If available in the XFA form, you can use the dataDescription . 如果在XFA表单中可用,则可以使用dataDescription It will give you the structure for the data and information like minimum and maximum occurrence for elements. 它将为您提供数据和信息的结构,例如元素的最小和最大出现次数。

Finally, you can look at the data that's already in the form (cf. xfa:data ). 最后,您可以查看已经采用格式的数据 (参见xfa:data )。 Keep in mind that this XML structure is not necessarily complete: empty elements can be omitted. 请记住,此XML结构不一定完整:可以省略空元素。 For example, if a form has 2 fields, the values could be specified as: 例如,如果一个表单有2个字段,则值可以指定为:

<SomeRoot>
    <Field1>Value1</Field1>
    <Field2></Field2>
</SomeRoot>

But also: 但是也:

<SomeRoot>
    <Field1>Value1</Field1>
</SomeRoot>

The first case will be easier for you to figure out the needed structure. 第一种情况将使您更容易找出所需的结构。 If xfa:data is missing or incomplete, you can try to fill out all the form fields manually with an XFA capable PDF viewer. 如果xfa:data丢失或不完整,您可以尝试使用具有XFA功能的PDF查看器手动填写所有表单字段。 When saving, the viewer will populate xfa:data , according to the data description and the data binding. 保存时,查看器将根据数据描述和数据绑定填充xfa:data

For reference: XFA specification 供参考: XFA规范

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM