I am trying to integrate Nutch 1.6 with Solr 4.3 (I copied the /apache-nutch-1.6/conf/schema-solr4.xml into collection1/conf/ and rename the file to schema.xml). I also tried Nutch1.5.1 to integrate with solr 4.3. In both situations I am getting IOException when running:
bash$ nutch crawl urls -solr http://127.0.0.1:8983/solr/
Job Failed. Any ideas?
I figuered that one out myself, had to look at solr.log and add these fields below to schema.xml under collection1/conf
<field name="host" type="string" stored="false" indexed="true"/>
<field name="segment" type="string" stored="true" indexed="false"/>
<field name="digest" type="string" stored="true" indexed="false"/>
<field name="boost" type="float" stored="true" indexed="false"/>
<field name="tstamp" type="date" stored="true" indexed="false"/>
and it worked.
Yes can you please put additional details from log. The possible cause may be you need to define the uniquekey in schema.xml file. like this
<uniqueKey>id</uniqueKey>.
vera, i just use nutch 1.7 and solr 4.4.0. i had problem in schema.xml file. i figure out to few change in schema file that change are below
copy ur usr/nutch 1.7/conf/ schema.xml to paste /usr/local/solr-4.4.0/example/solr/collection1/conf/schema and overridded after u change the field type="text" not a text_field. change to
content= text class change that englishPorterFilterFactory to SnowballPorterFilterFactory
after add
field name=" version " type="long" indexed="true" stored="true"
field name="text" type="text" indexed="true" stored="false" multiValued="true"
its working fine for me vera..
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.