简体繁体中英

iText PDF Text Extraction with fonts and styles

原文 2013-01-23 22:46:50 9 1 java/ android/ pdf/ itext

I am using iText to extract text from PDF to a String but I have encountered a problem with some PDF. When I tried to extract text, the reader extract only blanks/destroyed text on SOME pdfs.

Example of destroyed text:

"th isbe long to t he t est fo r extr act ion tex t"

What is the cause of this problem?

I am thinking of removing the fonts and change the font to a suitable one to be read by the reader. I have tried researching about this, but what I found does not help me.

1 answers

This is caused by the way text is stored in the PDF file. It just puts letters with information for rendering and location. The text extraction algorithm is smart in that it finds letters that seem to be close together and, if so, it puts them together. If they aren't that close, it puts in some space.

I can't tell you what to do about it, though.

PDF text extraction via iText returns strange characters

java itext catching null exception pdf text extraction

How to use Fonts in iText PDF

itext html to pdf with multiple fonts

Android iText Reading PDF: Difference between iText's Parser and Extraction

PDF text extraction in Java

Embed non-embedded fonts in PDF with IText

Replace fonts in a PDF using iText (Java)

Spacing between Itext PDF fonts is incorrect

iText PDF A-2 with English and Hindi Fonts

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question PDF text extraction via iText returns strange characters java itext catching null exception pdf text extraction How to use Fonts in iText PDF itext html to pdf with multiple fonts Android iText Reading PDF: Difference between iText's Parser and Extraction PDF text extraction in Java Embed non-embedded fonts in PDF with IText Replace fonts in a PDF using iText (Java) Spacing between Itext PDF fonts is incorrect iText PDF A-2 with English and Hindi Fonts

Related Tags

iText PDF Text Extraction with fonts and styles

Question

1 answers

solution1 0 ACCPTED 2013-01-23 22:51:12

solution1
0 ACCPTED 2013-01-23 22:51:12