简体   繁体   中英

how to extract structured informaion from pdf file in java

我需要从pdf文件中提取表格,我知道它不是以表格格式存储的,但是我想从java中的pdf中读取学生结果,如果有人知道的话,请帮助。

You should use a PDF parser for that. Check out this list of open source PDF libraries for Java .

SOme PDF files contain PDF structured text (http://www.jpedal.org/PDFblog/2010/09/the-easy-way-to-discover-if-a-pdf-file-contains-structured-content/). If they do not, it is down to the heuristics of the parser to guess this and add structure.

The PdfBox developers did a lot of work on tables but it will never be perfect

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM