简体   繁体   中英

Using PDFBox how can i extract a pdf's tab order from each field?

I am trying to convert a pdf to html. The PDF i have has tab order configured on several fields. Using Pdfbox how can i extract the tab order value set on each field so that i can then set the tab index within the html?

I've tried iterating over each field (PDField) by doing PDAcroForm.getFields() but that gives me the fields in random order. Then i thought that maybe i can extract tab order information from the field itself but PDField does not hold any tab order information.

Any other ideas??

As already mentioned in a comment: Beware, you iterate over the fields in the abstract form definition in the pdf. Each of these fields may have appearances, widget annotations, on any number of pages. Thus, a field does not have a tab order value, its annotations have. These order values can be derived from the order in the annotation collections of the document pages or determined by their position on the page, depending on Page settings.

The page setting in question is the Tabs entry of the respective page object:

Key Type Value
Tabs name (Optional; PDF 1.5) A name specifying the tab order that shall be used for annotations on the page. The possible values shall be R (row order), C (column order), and S (structure order). See 12.5, "Annotations" for details.

(ISO 32000-1, Table 30 – Entries in a page object)

In ISO 32000-2 (now Table 31) the following has been added to the value description:

Beginning with PDF 2.0, additional values also include A (annotations array order) and W (widget order). Annotations array order refers to the order of the annotation enumerated in the Annots entry of the Page dictionary (see "Table 31 — Entries in a page object"). Widget order means using the same array ordering but making two passes, the first only picking the widget annotations and the second picking all other annotations.

Interestingly no default is defined for this optional entry, so it appears to be implementation dependent.

For the example document accompanying your parallel PDFBox Jira issue the page has a Tabs value of W , so it's the widgets first annotation array order on that page.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM