简体   繁体   中英

How to tell whether a PDF is tagged

Is it possible to determine programatically whether a PDF is "tagged" (for accessibility)? I'm using PHP, and would like (if possible) to simply read a PDF file and return true if tagged, false if not.

I've looked at FPDF and TCPDF , but it isn't clear to me whether either can extract this information.

In the official ISO PDF-1.7 specification (in the copy available for free from the Adobe website), I read on page 574:

"A Tagged PDF document shall also contain a mark information dictionary (see Table 321) with a value of true for the Marked entry."

To me that means...

  1. ...you'll have to parse the PDF structure and
  2. ...look for the document catalogue
  3. ...where there should be a MarkInfo entry
  4. ...specifying a mark information dictionary
  5. ...which should contain a key named Marked with a boolean value of true for tagged PDF.

Perhaps you can go further with this (check all PDF_get_xx functions). You will also need this as reference.

Based on this:

Characteristics of a properly tagged PDF:

    - The PDF file includes a logical reading order for its content
    - Images are given correct alternate descriptions
    - Tables are correctly tagged to represent the table structure
    - Form-fields are authored to promote their utility to screen-readers
    - Represents text as Unicode to clear up composition irregularities such as soft
      and hard hyphens

you might get further.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM