简体   繁体   中英

integrate Nutch 1.6 with Solr 4.3 IOException when running <nutch crawl urls -solr http://localhost:8983/solr/> Job Failed. Any ideas?

I am trying to integrate Nutch 1.6 with Solr 4.3 (I copied the /apache-nutch-1.6/conf/schema-solr4.xml into collection1/conf/ and rename the file to schema.xml). I also tried Nutch1.5.1 to integrate with solr 4.3. In both situations I am getting IOException when running:

bash$ nutch crawl urls -solr http://127.0.0.1:8983/solr/

Job Failed. Any ideas?

I figuered that one out myself, had to look at solr.log and add these fields below to schema.xml under collection1/conf

<field name="host" type="string" stored="false" indexed="true"/> <field name="segment" type="string" stored="true" indexed="false"/> <field name="digest" type="string" stored="true" indexed="false"/> <field name="boost" type="float" stored="true" indexed="false"/> <field name="tstamp" type="date" stored="true" indexed="false"/> and it worked.

Yes can you please put additional details from log. The possible cause may be you need to define the uniquekey in schema.xml file. like this

<uniqueKey>id</uniqueKey>.

vera, i just use nutch 1.7 and solr 4.4.0. i had problem in schema.xml file. i figure out to few change in schema file that change are below

copy ur usr/nutch 1.7/conf/ schema.xml to paste /usr/local/solr-4.4.0/example/solr/collection1/conf/schema and overridded after u change the field type="text" not a text_field. change to

content= text class change that englishPorterFilterFactory to SnowballPorterFilterFactory

after add

field name=" version " type="long" indexed="true" stored="true"

field name="text" type="text" indexed="true" stored="false" multiValued="true"

its working fine for me vera..

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM