I want to index my a corpus using solr.
To create a sequence file, I used the following command:
./behemoth -i file://path/to/my/file/where/the corpus/is/located -o /user/user-name/file-to-which-the-putput-is-stored
After this I gave the following command for indexing:
./behemoth solr /user/user-name/pTH-to-which-output-is-stored-in-previous-command http://localhost:8983/solr
But its is giving the following error:
15/06/04 11:51:07 INFO mapreduce.Job: Job job_local183059797_0001 running in uber mode : false
15/06/04 11:51:07 INFO mapreduce.Job: map 0% reduce 0%
15/06/04 11:51:08 INFO mapred.LocalJobRunner:
15/06/04 11:51:08 INFO impl.ConcurrentUpdateSolrServer: Status for: file:///usr/local/ASR/data/Corpus/en_TheTelegraph_2001-2010/telegraph_2007-2010/telegraph_1st_oct_2007_to_31st_dec_2007/foreign/1071015_foreign_story_8435523.utf8 is 404
15/06/04 11:51:08 ERROR impl.ConcurrentUpdateSolrServer: error
java.lang.Exception: Not Found
I am unable to figure out the issue as the above mentioned file exists on that path. Please help
Just found your question, best to ask on the DigitalPebble mailing list or open an issue on GitHub.
I don't think the problem is related to the content of the input. Looks more like it can't connect to SOLR.
Also you've imported a corpus of documents but no text or metadata have been extracted as part of the import. You should run the Tika module on your input first.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.