简体   繁体   中英

Best API for reading a huge .pdf file from java

I have a huge pdf file (20 mb/800 pages) which contains some information.

It has got index with hyperlinks. Also most of the remaining information is in Tabular format (in pdf). I need to retrieve this information using Java and store it in SQL Server.

Which is the best API available to read this kind of file from Java?

It is unlikely to be in tabular format inside the PDF as PDF does not contain structure information unless explicitly added at creation time. I wrote an article explaining some of the issues with text extraction from at PDF at http://www.jpedal.org/PDFblog/2009/04/pdf-text/

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM