简体   繁体   中英

UTF-8 emoji problem in PDF for Spring Boot

I am using Spring Boot to create and return PDF. There is an issue when my string content contains emoji and Unicode characters like "This is d£escript😭ion section😢😤😠😡🤬" , then in downloaded PDF they are skipped. Can someone please help me to resolve this issue.

My code is like below

ITextRenderer renderer = new ITextRenderer();
ResourceLoaderUserAgent callback = new ResourceLoaderUserAgent(renderer.getOutputDevice());
callback.setSharedContext(renderer.getSharedContext());
renderer.getSharedContext().setUserAgentCallback(callback);

renderer.setDocumentFromString(pdfContent(templateId, pdfData));
renderer.layout();
renderer.createPDF(outputStream);
 }

pdfContent(TemplateId templateId, Map<String, Object> pdfData) throws TemplateException,
         IOException {
     return FreeMarkerTemplateUtils
             .processTemplateIntoString(freemarkerMailConfiguration.getTemplate(templateId.getValue()), pdfData);
 }

The problem is that the font you use doesn't contain emojis, so they can't be rendered in the PDF. Unfortunately, I could not find a font that would cover all emojis. The best I could find is DejaVu, which cover some of the emojis in your example.

To use it,

  • you have to download the DejaVu font (you will find it easily on the internet).
  • include it in the rendering process (make sure you match the exact path of the file):
ITextRenderer renderer = new ITextRenderer();
renderer.getFontResolver().addFont("font/dejavu-sans/DejaVuSans.ttf", BaseFont.IDENTITY_H, BaseFont.NOT_EMBEDDED);
  • set the font in the HTML:
<html>
<head>
    <meta charset="utf-8" />
    <style>
        body{font-family:"DejaVu Sans", sans-serif;}
    </style>
</head>
<body>
    <p>This is descript😭ion section😢😤😠😡🤬.</p>
</body>
</html>

Here is the result in the PDF: pdf结果

Emoji symbols are problematic as symbols we can see that if we use one font with two styles (upper left) even in one font the symbols are not matched well so in upper style there is one missing and in lower style two look identical.

Converted to PDF (upper middle) they look reasonable on the surface graphic image however we see that when extracted text (upper right) the font styling was lost and there is only one glyph possible for each valid font character.

在此处输入图像描述

So the lower row is on left also as shown in modern notepad however the same system font is now applying the other style and if we extract those we get

😭😢😤😠😡🤬 as 在此处输入图像描述

Thus the way a font and its style of emoji symbols is generally not well supported by a font system but if we traverse via html it is much more consistent however the text is not text. 在此处输入图像描述

The best we might get is a poor hybrid of images of undefined CID characters which can be confusing as the characters are all the same.

������
������

在此处输入图像描述

So if you export the pdf as symbols with an image overlay there is no visual equivalence

在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM