简体繁体中英

Open Source Java Text Parsers

原文 2011-06-22 17:48:11 4 2 java/ pdf/ ms-office/ openoffice.org/ text-parsing

Is there a single Java text parser which can be used to parse Office (windows) documents, OpenOffice documents, and PDFs as well? Else do I need to use something like Apache POI for Word documents and other libraries for OpenOffice and PDFs? If so what are the best options for OpenOffice and PDFs?

2 answers

If the task is reading PDF documents, iText is your best bet. For Microsoft Office and OpenOffice (LibreOffice) based documents, POI would be my solution.

Apache Tika :

The Apache Tika™ toolkit detects and extracts metadata and structured text content from various documents using existing parser libraries.

Not sure whether this qualifies as "single" for your purposes.

Text extraction with java html parsers

Open source java library for HTML to text conversion

Open Source Java Profilers

Java exception handling in parsers

Java parsers testing

Open Source Text Localization Library

Open source Telnet Java API

open source mail clients in java

Java - Export Project to Open Source

Open Source Java Report + Graph

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Text extraction with java html parsers Open source java library for HTML to text conversion Open Source Java Profilers Java exception handling in parsers Java parsers testing Open Source Text Localization Library Open source Telnet Java API open source mail clients in java Java - Export Project to Open Source Open Source Java Report + Graph

Related Tags

Open Source Java Text Parsers

Question

2 answers

solution1
2 2011-06-22 18:03:24

solution2
2 ACCPTED 2011-06-22 22:00:25

Open Source Java Text Parsers

Question

2 answers

solution1 2 2011-06-22 18:03:24

solution2 2 ACCPTED 2011-06-22 22:00:25

solution1
2 2011-06-22 18:03:24

solution2
2 ACCPTED 2011-06-22 22:00:25