简体   繁体   中英

How to index PDF Document on Apache Solr

I am new at Solr. Since l could not understand anything by looking at other topics, their explanations were so much professional for me and l am looking for basic explanation about indexing PDF Documents into Solr.

l found this link from some stackoverflow topics, but it is not a tutorial.

http://wiki.apache.org/solr/ExtractingRequestHandler

l just would like to add many pdf documents into solr and search & download them.

How can l do this and do l have to create a java project on eclipse or anywhere ?

I'd have a look at one of the tutorials out there, like for example Solr in 5 minutes, here[1] the link.

Normally, Solr, like ElasticSearch allow to index OTB without any code to write, so via simple configuration files you should be able to point the folder to index; in some case, the CLI tool should allow you to specify on the command line such information.

Anyway, the easiest way to doing that with Solr consists in using the 'post.jar':

cd example/exampledocs
java -Dc=techproducts -jar post.jar sd500.xml

to add sd500.xml. If you have multiple files, a simple bash script to loop over them and post to Solr.

Hope it helps!

[1] http://www.solrtutorial.com/solr-in-5-minutes.html

[2] https://wiki.apache.org/solr/SolrConfigXml

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM