How to index PDF Document on Apache Solr

Question

I am new at Solr. Since l could not understand anything by looking at other topics, their explanations were so much professional for me and l am looking for basic explanation about indexing PDF Documents into Solr.

l found this link from some stackoverflow topics, but it is not a tutorial.

http://wiki.apache.org/solr/ExtractingRequestHandler

l just would like to add many pdf documents into solr and search & download them.

How can l do this and do l have to create a java project on eclipse or anywhere ?

Answer 1

I'd have a look at one of the tutorials out there, like for example Solr in 5 minutes, here[1] the link.

Normally, Solr, like ElasticSearch allow to index OTB without any code to write, so via simple configuration files you should be able to point the folder to index; in some case, the CLI tool should allow you to specify on the command line such information.

Anyway, the easiest way to doing that with Solr consists in using the 'post.jar':

cd example/exampledocs
java -Dc=techproducts -jar post.jar sd500.xml

to add sd500.xml. If you have multiple files, a simple bash script to loop over them and post to Solr.

Hope it helps!

[1] http://www.solrtutorial.com/solr-in-5-minutes.html

[2] https://wiki.apache.org/solr/SolrConfigXml

How to index PDF Document on Apache Solr

Question

1 answers

solution1
0 2015-10-18 09:48:15

How to index PDF Document on Apache Solr

Question

1 answers

solution1 0 2015-10-18 09:48:15

solution1
0 2015-10-18 09:48:15