简体繁体 English

有没有办法使用 JavaScript 来查看 PDF 是否为屏幕阅读器添加了标签？

[英]Is there a way to see if a PDF is tagged for screen readers with JavaScript?

原文 2021-10-18 15:16:41 2 1 javascript/ node.js/ pdf/ accessibility

I am interested in making a simple checker that receives a PDF as input and looks to see if that PDF is tagged for screen readers.我有兴趣制作一个简单的检查器，该检查器接收 PDF 作为输入，并查看该 PDF 是否为屏幕阅读器进行了标记。 This information isn't available in the metadata.此信息在元数据中不可用。 Does anyone know/can point me in the right direction, if doing this is possible with JavaScript, possibly with PDF.js?有没有人知道/可以指出我正确的方向，如果这样做可以使用 JavaScript，也可以使用 PDF.js？

Thank you!谢谢！

1 个解决方案

There are PDFs that contain objects with tags that can be screen read without the tags and there are pdfs with tags that cannot be screen read and also correctly Tagged PDF files that fully conform to all PDF/UA or PDF/A-2 requirements.有些 PDF 包含带有标签的对象，可以在没有标签的情况下进行屏幕阅读，有些 pdf 带有无法屏幕阅读的标签，以及完全符合所有 PDF/UA 或 PDF/A-2 要求的正确标记的 PDF 文件。

Thus for screen reading there should be no point looking for simplistic tags or tagging other than to test the file passes muster for using a conformance checker.因此，对于屏幕阅读，除了使用一致性检查器测试文件通过集合之外，寻找简单的标签或标记没有意义。

From iText来自 iText

If you have a document that has a picture of a fox and a dog, iText can't add any missing alt text for those images, because iText can't see that fox nor that dog.如果您有一个包含狐狸和狗图片的文档，iText 无法为这些图像添加任何缺失的替代文本，因为 iText 看不到那只狐狸和那只狗。

PDF objects can be encrypted or encoded thus not always easy to detect as a simple structure, however some data must not be encoded. PDF 对象可以被加密或编码，因此作为一个简单的结构并不总是容易检测，但是某些数据不能被编码。 If you are lucky the unencoded metadata may include the string pdfua or PDF/UA, which does not prove conformance just an attempt.如果幸运的话，未编码的元数据可能包含字符串 pdfua 或 PDF/UA，这并不能证明一致性只是一种尝试。 Also beware any tagged file that has an article about PDF/UA production but is not one :-)还要注意任何有关于 PDF/UA 生产的文章但不是一篇的标记文件:-)