[英]How to get the content of PDF form text fields using pdfbox?
I'm using this to get the text of a PDF file using org.apache.pdfbox我正在使用它来使用 org.apache.pdfbox 获取 PDF 文件的文本
File f = new File(fileName);
if (!f.isFile()) {
System.out.println("File " + fileName + " does not exist.");
return null;
}
try {
parser = new PDFParser(new FileInputStream(f));
} catch (Exception e) {
System.out.println("Unable to open PDF Parser.");
return null;
}
try {
parser.parse();
cosDoc = parser.getDocument();
pdfStripper = new PDFTextStripper();
pdDoc = new PDDocument(cosDoc);
parsedText = pdfStripper.getText(pdDoc);
} catch (Exception e) {
e.printStackTrace();
}
It works great for the PDFs I've used it on so far.它非常适合我目前使用过的 PDF。 Now I have a PDF form that has editable text fields in it.
现在我有一个 PDF 表单,其中包含可编辑的文本字段。 My code does not return the text inside the fields.
我的代码不返回字段内的文本。 I would like to get that text.
我想得到那个文本。 Is there a way to get it using PDFBox?
有没有办法使用 PDFBox 获取它?
This is how you get key/value for AcroForms: (This particular program prints it to the console.)这是您获取 AcroForms 键/值的方式:(此特定程序将其打印到控制台。)
package pdf_form_filler;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDDocumentCatalog;
import org.apache.pdfbox.pdmodel.interactive.form.*;
import java.io.File;
import java.util.*;
public class pdf_form_filler {
public static void listFields(PDDocument doc) throws Exception {
PDDocumentCatalog catalog = doc.getDocumentCatalog();
PDAcroForm form = catalog.getAcroForm();
List<PDFieldTreeNode> fields = form.getFields();
for(PDFieldTreeNode field: fields) {
Object value = field.getValue();
String name = field.getFullyQualifiedName();
System.out.print(name);
System.out.print(" = ");
System.out.print(value);
System.out.println();
}
}
public static void main(String[] args) throws Exception {
File file = new File("test.pdf");
PDDocument doc = PDDocument.load(file);
listFields(doc);
}
}
PDFieldTreeNode
doesn't seem to be supported anymore. PDFieldTreeNode
似乎不再受支持。 Try PDField
尝试
PDField
For those trying to use this same method nowadays.对于那些现在尝试使用相同方法的人。
public static void listFields(PDDocument doc) throws Exception {
PDDocumentCatalog catalog = doc.getDocumentCatalog();
PDAcroForm form = catalog.getAcroForm();
List<PDField> fields = form.getFields();
for(PDField field: fields) {
Object value = field.getValueAsString();
String name = field.getFullyQualifiedName();
System.out.print(name);
System.out.print(" = ");
System.out.print(value);
System.out.println();
}
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.