简体   繁体   中英

content from html missing in pdf created by iTextrenderer

I am trying to create pdf from one html which has chinese char. in this i have got weird prob. the line from html which has chinese char is not completely shown in pdf generated from it.

Below is my html:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1?DTD/transitional.dtd">
<html>
<head>
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>some title.</title>

<style type="text/css">
     .name
   {
         font-family: "Arial Unicode MS";
         color:red;
         margin-left: 5px;
         margin-right: 5px
     }
</style>
</head>
<body>
 <b class="name">

LLTRN,DEBIT,,,6841,FXW,,CNY,PAY,C,,,,DD,,ord par nm,,,,,,,CN,百威英博雪津(三明)啤酒有限公司,,,,,,,CN,20140617,,CNY,647438.24,OUR,,,,,,,,SHANGHAI,CN,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

    <br>

RDF,FTX,TEXT
<br>
</b>
<br>
</body></html>

below is my itext renderer code:

StringWriter writer = new StringWriter();
Tidy tidy = new Tidy();
tidy.setTidyMark(false);
tidy.setDocType("omit");
tidy.setXHTML(true);
tidy.setInputEncoding("utf-8");
tidy.setOutputEncoding("utf-8");
//tidy.parse(new StringReader(documentJsoup.toString()), writer);
tidy.parse(new StringReader(inputFileString), writer);
writer.close();
String  pdfContent = writer.toString();

// Creating an instance of iText renderer which will be used to generate the pdf from the html document.
ITextRenderer renderer = new ITextRenderer();           

/*renderer.setDocument(doc, baseurl);
renderer.layout();
renderer.createPDF(os);
os.flush();         

// close all the streams
//fis.close();
//os.close();
//instream.close();
 */
ITextFontResolver resolver = renderer.getFontResolver();

//renderer.getFontResolver().addFont("C:\\Windows\\Fonts\\arialuni.ttf", BaseFont.IDENTITY_H, BaseFont.NOT_EMBEDDED);
resolver.addFont("C:\\Windows\\Fonts\\arialuni.ttf", BaseFont.IDENTITY_H, BaseFont.NOT_EMBEDDED);
renderer.setDocumentFromString(pdfContent);
renderer.layout();
renderer.createPDF(os);

since i used font resolver and add font, chinese char are shown.... but pdf shows missing content.... last characters of that line (thats :"AI" from "shanghai" and next ",CN,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,") is not visible.... its seen something like:

html2pdf:内容丢失

i tried a lot to see whats wrong but couldnt find solution. can anybody from u help me in resolving this issue pls ?? TIA!

The issue is that Flying-saucer doesn't manage line wrapping in chinese text. It only insert line break on whitespaces. In your case, it means it cannot insert a line break after "nm,,,,", and it doesn't fit on the line.

It is a known bug in flying saucer (see here ), but it's unlikely to be fixed soon.

The only workaround is to insert a whitespace anywhere in your string after the Chinese characters. It will make all the text visible.

Here you need to add font type or font file in your application.

you can find code here itextSharp - html to pdf some turkish characters are missing

this question is also same as your question..

if this helps you then please give points.

I tried adding below css rules into the body class and it worked perfectly.

word-wrap: break-word; word-break: break-all;

"Adding whitespaces" works sometimes (I tried adding spaces after symbols like 。 or 、), but sometimes when there's no symbols it still overflows.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM