简体   繁体   中英

How to use itext to fill out (dynamic XFA) PDF from data in a text file

I have a local PDF form that has a specific template that never changes. I've identified the form as an XFA (xml) dynamic form since no keysets were returned. I'm trying to use itext to fill in the form with data contained in a .txt file. From my understanding I need to somehow get the data from the text file and place it properly into a .xml file so that itext can manipulate the original PDF by using the given xml.

The form has the following layout as an example:

例

The sample code I'm using in Eclipse compiles/runs successfully but it requires the data in the file data.xml in order to populate the empty form with field data and output the filled-in version. The thing is, for my actual project I don't have a data.xml file to use in order to populate the form properly. The raw field data is in a .txt file with each line containing data for a different field in the PDF.

EXAMPLE: Referencing the image above, my .txt file looks like this for the fields up to and including the field labelled "FOUR":

  • John
  • 15
  • Black
  • Honda
  • Toyota
  • Ford
  • BMW

I'm confused about 2 things:

1. How do I extract the original PDF's xml structure so that I know the format to adhere to when populating it with data from the .txt file?

2. How do I get the values from the text file and insert them into the .xml structure properly?

The following code works but requires data.xml in order to fill in "incomplete.pdf". It uses the code xfa.fillXfaForm(new FileInputStream(XML)); to input the data, but I'm stuck on how to identify the structure for "XML" and how to fill it in in the first place.

Any help is appreciated, thank you very much.

Code:

package sandbox;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;


import java.io.PrintStream;
import java.util.Set;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.TransformerFactoryConfigurationError;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;

import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;

import com.itextpdf.text.DocumentException;
import com.itextpdf.text.pdf.AcroFields;
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.PdfStamper;
import com.itextpdf.text.pdf.XfaForm;


public class FillXFA {

    public static final String SRC = "C:/Workspace/PDF/incomplete.pdf";
    public static final String XML = "C:/Workspace/PDF/data.xml";
    public static final String DEST = "C:/Workspace/PDF/completed.pdf";

    public static void main(String[] args) throws IOException, DocumentException {
        File file = new File(DEST);
        file.getParentFile().mkdirs();
        new FillXFA().manipulatePdf(SRC, DEST);
    }

    public void readXfa(String src, String dest)
            throws IOException, ParserConfigurationException, SAXException,
                TransformerFactoryConfigurationError, TransformerException {
            FileOutputStream os = new FileOutputStream(dest);
            PdfReader reader = new PdfReader(src);
            XfaForm xfa = new XfaForm(reader);
            Document doc = xfa.getDomDocument();
            Transformer tf = TransformerFactory.newInstance().newTransformer();
            tf.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
            tf.setOutputProperty(OutputKeys.INDENT, "yes");
            tf.transform(new DOMSource(doc), new StreamResult(os));
            reader.close();
        }

    public void manipulatePdf(String src, String dest)
        throws IOException, DocumentException {
        PdfReader reader = new PdfReader(src);
        PdfStamper stamper = new PdfStamper(reader,
                new FileOutputStream(dest));
        AcroFields form = stamper.getAcroFields();
        XfaForm xfa = form.getXfa();
        xfa.fillXfaForm(new FileInputStream(XML));
        stamper.close();
        reader.close();
    }
}

In XFA, the link between form fields and form data is made using a concept called data binding . Fields can have an XPath-like expression to select their value from the XML data structure. This implies that the XML data needs to be suitably structured to work for a specific XFA form, but this structure is not necessarily unique.

A simple example: Suppose you have an XFA form with just 1 text field. This text field has a data binding to any XML element with tag name "Name" . In this case you data.xml can simply be:

<Name>Hurmle</Name>

But this, and an infinite number of different XML structures, will also work:

<StackOverflow>
    <accounts>
        <account>
            <Name>Hurmle</Name>
        </account>
    </accounts>
</StackOverflow>

The readXfa method in your code sample will work to extract the complete XML stream from the XFA form. It consists of different parts. The most relevant are:

  • template : Describes the logical form structure, including all the fields and their data binding.
  • xfa:datasets : Holds information about the data. Consists of 2 parts.
    • dataDescription : A schema for the form data, optional. The data description grammar is defined in the XFA specification.
    • xfa:data : The form data.

One way to determine which XML structure will work, is to look at the data binding of all the fields (cf template ). Thus you will know where the fields expect to get their data. For a non-trivial form, this can be complex and/or a lot of work.

If available in the XFA form, you can use the dataDescription . It will give you the structure for the data and information like minimum and maximum occurrence for elements.

Finally, you can look at the data that's already in the form (cf. xfa:data ). Keep in mind that this XML structure is not necessarily complete: empty elements can be omitted. For example, if a form has 2 fields, the values could be specified as:

<SomeRoot>
    <Field1>Value1</Field1>
    <Field2></Field2>
</SomeRoot>

But also:

<SomeRoot>
    <Field1>Value1</Field1>
</SomeRoot>

The first case will be easier for you to figure out the needed structure. If xfa:data is missing or incomplete, you can try to fill out all the form fields manually with an XFA capable PDF viewer. When saving, the viewer will populate xfa:data , according to the data description and the data binding.

For reference: XFA specification

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM