简体   繁体   English

使用 pdfbox 获取表单字段值

[英]Using pdfbox to get form field values

I'm using pdfbox for the first time.我第一次使用pdfbox。 Now I'm reading something on the website Pdf现在我正在阅读网站上的一些东西Pdf

Summarizing I have a pdf like this:总结一下我有一个这样的pdf:

在此处输入图片说明

only that my file has many and many different component(textField,RadionButton,CheckBox).只是我的文件有很多不同的组件(textField、RadionButton、CheckBox)。 For this pdf I have to read these values : Mauro,Rossi,MyCompany.对于此 pdf,我必须阅读以下值:Mauro、Rossi、MyCompany。 For now I wrote the following code:现在我写了以下代码:

PDDocument pdDoc = PDDocument.loadNonSeq( myFile, null );
PDDocumentCatalog pdCatalog = pdDoc.getDocumentCatalog();
PDAcroForm pdAcroForm = pdCatalog.getAcroForm();

for(PDField pdField : pdAcroForm.getFields()){
    System.out.println(pdField.getValue())
}

Is this a correct way to read the value inside the form component?这是读取表单组件内值的正确方法吗? Any suggestion about this?对此有何建议? Where can I learn other things on pdfbox?我在哪里可以在 pdfbox 上学到其他东西?

The code you have should work.您拥有的代码应该可以工作。 If you are actually looking to do something with the values, you'll likely need to use some other methods.如果您真的想对这些值做一些事情,您可能需要使用其他一些方法。 For example, you can get specific fields using pdAcroForm.getField(<fieldName>) :例如,您可以使用pdAcroForm.getField(<fieldName>)获取特定字段:

PDField firstNameField = pdAcroForm.getField("firstName");
PDField lastNameField = pdAcroForm.getField("lastName");

Note that PDField is just a base class.请注意, PDField只是一个基类。 You can cast things to sub classes to get more interesting information from them.您可以将事物转换为子类以从中获取更多有趣的信息。 For example:例如:

PDCheckbox fullTimeSalary = (PDCheckbox) pdAcroForm.getField("fullTimeSalary");
if(fullTimeSalary.isChecked()) {
    log.debug("The person earns a full-time salary");
} else {
    log.debug("The person does not earn a full-time salary");
}

As you suggest, you'll find more information at the apache pdfbox website.正如您所建议的,您可以在 apache pdfbox 网站上找到更多信息。

The field can be a top-level field.该字段可以是顶级字段。 So you need to loop until it is no longer a top-level field, then you can get the value.所以需要循环直到不再是顶级字段,然后才能拿到值。 Code snippet below loops through all the fields and outputs the field names and values.下面的代码片段循环遍历所有字段并输出字段名称和值。

{
    //from your original code
    PDDocument pdDoc = PDDocument.loadNonSeq( myFile, null );
    PDDocumentCatalog pdCatalog = pdDoc.getDocumentCatalog();
    PDAcroForm pdAcroForm = pdCatalog.getAcroForm();


    //get all fields in form
    List<PDField> fields = acroForm.getFields();
    System.out.println(fields.size() + " top-level fields were found on the form");

    //inspect field values
    for (PDField field : fields)
    {
            processField(field, "|--", field.getPartialName());
    }

    ...
}


private void processField(PDField field, String sLevel, String sParent) throws IOException
{
        String partialName = field.getPartialName();

        if (field instanceof PDNonTerminalField)
        {
                if (!sParent.equals(field.getPartialName()))
                {
                        if (partialName != null)
                        {
                                sParent = sParent + "." + partialName;
                        }
                }
                System.out.println(sLevel + sParent);

                for (PDField child : ((PDNonTerminalField)field).getChildren())
                {
                        processField(child, "|  " + sLevel, sParent);
                }
        }
        else
        {
            //field has no child. output the value
                String fieldValue = field.getValueAsString();
                StringBuilder outputString = new StringBuilder(sLevel);
                outputString.append(sParent);
                if (partialName != null)
                {
                        outputString.append(".").append(partialName);
                }
                outputString.append(" = ").append(fieldValue);
                outputString.append(",  type=").append(field.getClass().getName());
                System.out.println(outputString);
        }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM