简体   繁体   English

如何判断特定字体是否包含PDF :: API2中的特定字符

[英]How to tell whether a particular font includes a particular character in PDF::API2

I use PDF::API2 in my Perl application to embed OCR output behind the corresponding image, allowing the resulting PDF to be searched, as the OCR output can be extracted with pdftotext . 我在我的Perl应用程序中使用PDF::API2将OCR输出嵌入到相应的图像后面,允许搜索生成的PDF,因为可以使用pdftotext提取OCR输出。

At the moment, as soon as the application sees a non-ASCII character in the OCR output, it switches from PDF core fonts to TTF. 此时,只要应用程序在OCR输出中看到非ASCII字符,它就会从PDF核心字体切换到TTF。 However, this is really hacky, as the core fonts include most Western European characters. 然而,这真的很酷,因为核心字体包括大多数西欧角色。 TTF is only necessary for Greek, Russian, Japanese, etc. TTF仅适用于希腊语,俄语,日语等。

How can I tell whether a particular font includes a particular character (including the CMAP table so that extraction with pdftotext works)? 如何判断特定字体是否包含特定字符(包括CMAP表以便使用pdftotext提取)?

Have you tried the glyph-specific methods? 您是否尝试过字形特定的方法?

http://search.cpan.org/dist/PDF-API2/lib/PDF/API2/Resource/BaseFont.pm#GLYPH_RELATED_METHODS http://search.cpan.org/dist/PDF-API2/lib/PDF/API2/Resource/BaseFont.pm#GLYPH_RELATED_METHODS

Failing that, perhaps rendering the glyph (to a separate document) and measuring it? 如果不这样做,也许渲染字形(到一个单独的文档)并测量它?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM