简体   繁体   中英

How to properly configure AutoDetectParser in Tika?

I'm using Tika to extract text from different types of files. So I use ''AutoDetectParser`` parser however it looks like that its registery is empty. I have the following code and it returns an empty list (both third and forth line).

 Parser parser = new AutoDetectParser();
 ParseContext con = new ParseContext();
 System.out.println(parser.getSupportedTypes(con)));
 System.out.println(" parsers "+parser.getParsers());

How should I properly configure the AutoDetectParser that it can call the proper Parser?

Promoting a comment to an answer - you don't normally need to! As long as you have at runtime the Tika Core and Tika Parsers jars, along with their required dependencies, then the default TikaConfig object will auto-detect and auto-load them all for you

If for some reason you've missed some jars at runtime, or you've been messing about repackaging Tika and lost some service files, then you'll want to follow the instructions on the Apache Tika troubleshooting wiki , especially around Identifying what Parsers your Tika install supports and Identifying if any Parsers failed to be loaded

(If you want to do non-standard things, such as exclude certain parsers, or force certain parsers, or make parsers handle non-standard mime types, then you need a custom Tika Config. Normally you'd do that with a tika-config.xml file, see here on the Tika wiki for what you can do)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM