简体   繁体   中英

Cyrillic symbols

I'm using iText7.
My class works with pdf files (it's a template with Cyrillic symbols), on a server.

First I read the document. Second I edit some information and try to save it on my local machine, but I have a problem. My new text is not shown correctly.

If I create a new pdf file with ttf fonts and add this to the newly created pdf file, everything works fine, but if I modify my template, the text is not correct (only for Cyrillic symbols).

I'm trying to use one of the simple examples from official website - http://developers.itextpdf.com/examples/stamping-content-existing-pdfs/clone-replacing-pdf-objects

Here is the relevant part of my code:

PdfDocument document = new PdfDocument(new PdfReader(template), new PdfWriter(dest));
        PdfPage page = document.getFirstPage();
        PdfDictionary dictionary = page.getPdfObject();
        PdfObject object = dictionary.get(PdfName.Contents);

        if (object instanceof PdfStream) {
            PdfStream stream = (PdfStream) object;
            byte[] data = stream.getBytes(true);

            stream.setData(new String(data).replace("user_fio", "Петров А.А.").getBytes("utf-8"));
}
document.close();

I'm trying to use locales: http://www.oracle.com/technetwork/java/javase/javase7locales-334809.html

But result is "????? ?.?." or something like that.

What am I doing wrong? Thank you!

PDF is not a wysiwyg format. You can not hope to simply replace information in content streams and have a nice-looking pdf. There are two reasons for this

  1. PDF documents store their information in objects. In order to be able to reference objects a byte-offset is stored. If you start replacing data, you are screwing up this internal table of byte-offsets.

  2. PDF documents do not contain the text as such. You should think of them more as containers of instructions. Changing the order of instructions, or the content of some instructions is not going to get the result you want.

    Reflow (having the text automatically laid out when text is inserted, removed or replaced) can not be done dynamically in a document. When you use code like yours, it will (almost always) mess up reflow.

    There are exceptions. In one of the examples on the website, the word "World" is replaced with "Bruno". This works because "World" and "Bruno" have the same number of letters (and thus the same number of bytes), and in the example I mentioned, they appear as the last word on their respective line. So reflow is not a problem there.

Summary: - PDF is not an editable format!

If you want to do something similar to your usecase, consider the following options:

  • generate the PDF from scratch every time
  • use forms (XFA or Acro) to have some kind of field that can accept dynamic content
  • convert HTML (dynamically generated) to PDF using pdfHTML

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM