简体   繁体   中英

Use PDFBox to create page numbers marked as ARTIFACT for correct accessibility

How can I add accessible page numbers tagged as artifacts to a PDF using PDFBox?

https://www.pdfa.org/wp-content/uploads/2019/06/TaggedPDFBestPracticeGuideSyntax.pdf

Section 3.7: Artifacts The process of laying out and paginating content for display can lead to the introduction of additional display items (eg page numbers on each page or table borders). These items are not part of what ISO 32000-1 defines as “real content”; they are considered artifacts of layout (see 14.8.2.2, “Real Content and Artifacts” in ISO 32000-1). A requirement for tagged PDF is to clearly distinguish “real” content from artifacts.

See question 16581471 second answer by Imal for how to add vanilla page numbers to a document using PDFBox. his code is copied below with my changes for accessibility. I added the lines

contentStream.beginMarkedContent(COSName.ARTIFACT)

and

contentStream.endMarkedContent()

A snippet of Imal's excellent code with my additions is:

    PDDocument document = PDDocument.load("Input.pdf");
    int page_counter = 1;
    int numberOfPages = document.getNumberOfPages();
    for(PDPage page : document.getPages()){
        PDPageContentStream contentStream = new PDPageContentStream(document, page, PDPageContentStream.AppendMode.APPEND, true, false);
        contentStream.beginMarkedContent(COSName.ARTIFACT);
        contentStream.beginText();
        contentStream.setFont(PDType1Font.TIMES_ITALIC, 10);
        PDRectangle pageSize = page.getMediaBox();
        float x = pageSize.getLowerLeftX();
        float y = pageSize.getLowerLeftY();
        contentStream.newLineAtOffset(x + pageSize.getWidth()- 100, y + 20);
        String text = "Page " + page_counter + " of " + numberOfPages;
        contentStream.showText(text);
        contentStream.endText();
        contentStream.endMarkedContent();
        contentStream.close();
        ++page_counter;
    }
    document.save("Output.pdf");

As far as we can tell the page numbers are not in the Acrobat Pro accessibility tree, which seems correct. The page numbers seem to have been marked as artifacts.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM