简体   繁体   English

使用pdf2htmlEx工具将pdf转换为html时字体未对齐

[英]Font misalignment during pdf to html conversion using pdf2htmlEx tool

FONT ISSUES WITH PDF TO HTML CONVERSION 从PDF到HTML转换的字体问题

  1. All "ti","fi","tt" characters are missing 缺少所有“ ti”,“ fi”,“ tt”字符

SAMPLE SCREENSHOT 样本画面

  1. Font overlapping issue 字体重叠问题

SAMPLE SCREENSHOT 样本画面

  • NOTE: I don't get this issue with firefox. 注意:我没有用Firefox遇到此问题。 Getting the above issues in chrome in safari browser 在Safari浏览器中的Chrome中获取上述问题

I AM USING 我在用

  • Using the 0.13.6 version of pdf2htmlEX 使用pdf2htmlEX的0.13.6版本
  • Using the following command to convert pdf to html 使用以下命令将pdf转换为html

pdf2htmlEX --split-pages 1 --zoom 3 --fit-width 920 --correct-text-visibility 1 --dest-dir $1 $2 2>&1 pdf2htmlEX-拆分页面1-缩放3-适合宽度920-正确文本可见性1-目标目录$ 1 $ 2 2>&1

TRIED 试过

Using --fallback 1 option solves all my above problems. 使用--fallback 1选项可以解决上述所有问题。 But

  1. The fallback option reduces the clarity of document. 后备选项降低了文档的清晰度。
  2. Table in the page disappears rather replaced with empty space. 页面中的表消失了,取而代之的是空白。

DOUBTS 怀疑

  1. Could you please explain a bit more on fallback? 您能否解释一下有关备用广告的更多信息?

  2. I have tried the above one (using fallback). 我已经尝试了以上一种(使用后备)。 Please suggest me if you prefer a different approach to solve the above problem with fonts. 如果您希望采用其他方法来解决上述字体问题,请提出建议。

Getting the above issues with chrome and safari whereas, in Firefox it is working fine. 使用chrome和safari可以解决上述问题,而在Firefox中可以正常工作。

The above issue occurs only in - webkit web browsers like chrome and safari - which provides support for ligatures - whereas browser like firefox does not. 仅在-chrome和safari之类的webkit Web浏览器-提供对连字的支持-而像firefox之类的浏览器却没有这种情况时,才会发生上述问题。

A ligature is a combination of two or more letters joined as a single glyph 连字是两个或多个字母组合成单个字形的组合

​Root cause 根本原因

This issue with missing characters is due to ligature support provided by these modern browsers - let me explain how 缺少字符的问题是由于这些现代浏览器提供的连字支持-让我解释一下

1.The tool while converting - it converts characters to glyphs using poppler for rendering - now these browser when they come across characters like tt tf ti ff fi consider them to be ligature and searches for glyphs corresponding to tt and not tt 1.转换时的工具-它使用poppler将字符转换为字形以进行渲染-现在这些浏览器遇到诸如tt tf ff ff之类的字符时,会认为它们是连字并搜索与tt而不是tt对应的字形

2.Since they do not have their corresponding glyphs - they just skip the characters and renders the rest - so, we fount the characters missing 2.由于它们没有对应的字形-他们只是跳过字符并渲染其余字符-因此,我们发现缺少的字符

Could be solved by 可以解决

Disabling/ Turning-off the ligature in these browsers - embedding the css in the generating content 在这些浏览器中禁用/关闭连字-将CSS嵌入到生成的内容中

For more details please refer: 有关更多详细信息,请参阅:

Please correct me if I am wrong. 如果我错了,请纠正我。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM