简体繁体 English

使用pdf2htmlEx工具将pdf转换为html时字体未对齐

[英]Font misalignment during pdf to html conversion using pdf2htmlEx tool

原文 2016-04-26 14:44:05 9 1 html/ css/ google-chrome/ safari/ pdf2htmlex

FONT ISSUES WITH PDF TO HTML CONVERSION 从PDF到HTML转换的字体问题

All "ti","fi","tt" characters are missing 缺少所有“ ti”，“ fi”，“ tt”字符

SAMPLE SCREENSHOT 样本画面

Font overlapping issue 字体重叠问题

SAMPLE SCREENSHOT 样本画面

NOTE: I don't get this issue with firefox. 注意：我没有用Firefox遇到此问题。 Getting the above issues in chrome in safari browser 在Safari浏览器中的Chrome中获取上述问题

I AM USING 我在用

Using the 0.13.6 version of pdf2htmlEX 使用pdf2htmlEX的0.13.6版本
Using the following command to convert pdf to html 使用以下命令将pdf转换为html

pdf2htmlEX --split-pages 1 --zoom 3 --fit-width 920 --correct-text-visibility 1 --dest-dir $1 $2 2>&1 pdf2htmlEX-拆分页面1-缩放3-适合宽度920-正确文本可见性1-目标目录$ 1 $ 2 2>＆1

TRIED 试过

Using --fallback 1 option solves all my above problems. 使用--fallback 1选项可以解决上述所有问题。 But 但

The fallback option reduces the clarity of document. 后备选项降低了文档的清晰度。
Table in the page disappears rather replaced with empty space. 页面中的表消失了，取而代之的是空白。

DOUBTS 怀疑

Could you please explain a bit more on fallback? 您能否解释一下有关备用广告的更多信息？

I have tried the above one (using fallback). 我已经尝试了以上一种（使用后备）。 Please suggest me if you prefer a different approach to solve the above problem with fonts. 如果您希望采用其他方法来解决上述字体问题，请提出建议。

Getting the above issues with chrome and safari whereas, in Firefox it is working fine. 使用chrome和safari可以解决上述问题，而在Firefox中可以正常工作。

1 个解决方案

The above issue occurs only in - webkit web browsers like chrome and safari - which provides support for ligatures - whereas browser like firefox does not. 仅在-chrome和safari之类的webkit Web浏览器-提供对连字的支持-而像firefox之类的浏览器却没有这种情况时，才会发生上述问题。

A ligature is a combination of two or more letters joined as a single glyph 连字是两个或多个字母组合成单个字形的组合

Root cause 根本原因

This issue with missing characters is due to ligature support provided by these modern browsers - let me explain how 缺少字符的问题是由于这些现代浏览器提供的连字支持-让我解释一下

1.The tool while converting - it converts characters to glyphs using poppler for rendering - now these browser when they come across characters like tt tf ti ff fi consider them to be ligature and searches for glyphs corresponding to tt and not tt 1.转换时的工具-它使用poppler将字符转换为字形以进行渲染-现在这些浏览器遇到诸如tt tf ff ff之类的字符时，会认为它们是连字并搜索与tt而不是tt对应的字形

2.Since they do not have their corresponding glyphs - they just skip the characters and renders the rest - so, we fount the characters missing 2.由于它们没有对应的字形-他们只是跳过字符并渲染其余字符-因此，我们发现缺少的字符

Could be solved by 可以解决

Disabling/ Turning-off the ligature in these browsers - embedding the css in the generating content 在这些浏览器中禁用/关闭连字-将CSS嵌入到生成的内容中

For more details please refer: 有关更多详细信息，请参阅：