简体   繁体   中英

Open Source Java Text Parsers

Is there a single Java text parser which can be used to parse Office (windows) documents, OpenOffice documents, and PDFs as well? Else do I need to use something like Apache POI for Word documents and other libraries for OpenOffice and PDFs? If so what are the best options for OpenOffice and PDFs?

If the task is reading PDF documents, iText is your best bet. For Microsoft Office and OpenOffice (LibreOffice) based documents, POI would be my solution.

Apache Tika :

The Apache Tika™ toolkit detects and extracts metadata and structured text content from various documents using existing parser libraries.

Not sure whether this qualifies as "single" for your purposes.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM