西里尔文符号

Question

I'm using iText7. 我正在使用iText7。
My class works with pdf files (it's a template with Cyrillic symbols), on a server. 我的课在服务器上使用pdf文件（这是带有西里尔字母符号的模板）。

First I read the document. 首先，我阅读了文档。 Second I edit some information and try to save it on my local machine, but I have a problem. 其次，我编辑了一些信息，然后尝试将其保存在本地计算机上，但是我遇到了问题。 My new text is not shown correctly. 我的新文字显示不正确。

If I create a new pdf file with ttf fonts and add this to the newly created pdf file, everything works fine, but if I modify my template, the text is not correct (only for Cyrillic symbols). 如果我使用ttf字体创建了一个新的pdf文件并将其添加到新创建的pdf文件中，则一切正常，但是如果我修改了模板，则文本不正确（仅适用于西里尔字母）。

I'm trying to use one of the simple examples from official website - http://developers.itextpdf.com/examples/stamping-content-existing-pdfs/clone-replacing-pdf-objects 我正在尝试使用官方网站上的简单示例之一-http://developers.itextpdf.com/examples/stamping-content-existing-pdfs/clone-replacing-pdf-objects

Here is the relevant part of my code: 这是我的代码的相关部分：

PdfDocument document = new PdfDocument(new PdfReader(template), new PdfWriter(dest));
        PdfPage page = document.getFirstPage();
        PdfDictionary dictionary = page.getPdfObject();
        PdfObject object = dictionary.get(PdfName.Contents);

        if (object instanceof PdfStream) {
            PdfStream stream = (PdfStream) object;
            byte[] data = stream.getBytes(true);

            stream.setData(new String(data).replace("user_fio", "Петров А.А.").getBytes("utf-8"));
}
document.close();

I'm trying to use locales: http://www.oracle.com/technetwork/java/javase/javase7locales-334809.html 我正在尝试使用语言环境： http : //www.oracle.com/technetwork/java/javase/javase7locales-334809.html

But result is "????? ?.?." 但是结果是“ ?????????。？”。 or something like that. 或类似的东西。

What am I doing wrong? 我究竟做错了什么？ Thank you! 谢谢！

Answer 1

PDF is not a wysiwyg format. PDF不是所见即所得格式。 You can not hope to simply replace information in content streams and have a nice-looking pdf. 您不能希望仅替换内容流中的信息并拥有漂亮的pdf。 There are two reasons for this 有两个原因

PDF documents store their information in objects. PDF文档将其信息存储在对象中。 In order to be able to reference objects a byte-offset is stored. 为了能够引用对象，存储了字节偏移量。 If you start replacing data, you are screwing up this internal table of byte-offsets. 如果您开始替换数据，那么您正在搞砸此内部字节偏移表。
PDF documents do not contain the text as such. PDF文档不包含此类文本。 You should think of them more as containers of instructions. 您应该将它们更多地视为说明容器。 Changing the order of instructions, or the content of some instructions is not going to get the result you want. 更改指令顺序或某些指令的内容不会获得所需的结果。
Reflow (having the text automatically laid out when text is inserted, removed or replaced) can not be done dynamically in a document. 无法在文档中动态进行重排（具有在插入，删除或替换文本时自动布置文本的功能）。 When you use code like yours, it will (almost always) mess up reflow. 当您使用类似您的代码时，它将（几乎总是）弄乱回流。
There are exceptions. 也有例外。 In one of the examples on the website, the word "World" is replaced with "Bruno". 在网站上的示例之一中，单词“ World”被替换为“ Bruno”。 This works because "World" and "Bruno" have the same number of letters (and thus the same number of bytes), and in the example I mentioned, they appear as the last word on their respective line. 之所以可行，是因为“世界”和“布鲁诺”具有相同数量的字母（因此具有相同的字节数），并且在我提到的示例中，它们在相应行上显示为最后一个单词。 So reflow is not a problem there. 因此回流在这里不是问题。

Summary: - PDF is not an editable format! 摘要：-PDF是不可编辑的格式！

If you want to do something similar to your usecase, consider the following options: 如果要执行与用例相似的操作，请考虑以下选项：

generate the PDF from scratch every time 每次从头开始生成PDF
use forms (XFA or Acro) to have some kind of field that can accept dynamic content 使用表格（XFA或Acro）具有某种可以接受动态内容的字段
convert HTML (dynamically generated) to PDF using pdfHTML 使用pdfHTML将HTML（动态生成）转换为PDF

西里尔文符号

问题描述

1 个解决方案

解决方案1
1 已采纳 2017-09-11 12:24:53

西里尔文符号

问题描述

1 个解决方案

解决方案1 1 已采纳 2017-09-11 12:24:53

解决方案1
1 已采纳 2017-09-11 12:24:53