简体   繁体   中英

nutch server that outputs to solr

I have rest nutch server I'm able tocreate jobs and everything.

  1. How can I configure the nutch server to output to solr? didn't find any configuration to that in the conf files (nutch-site, nutch-default)

You just need to configure the required parameters of Nutch ( http.agent.name ) and just indicate that you want to index your content in the desired Solr instance, for instance using the bin/crawl script you'll just need to add the solr.server.url property:

$ bin/crawl -i -D solr.server.url=http://localhost:8983/solr/ urls/ crawl/ 2

If you execute bin/crawl in the terminal you'll get more information about the available options. A more comprehensive introduction is available here . For the 2.x branch the bin/crawl script has some differences.

Just set the solr.server.url through the configuration endpoint, and then create the Index Job, this should do the trick:

POST /job/create
{  
    "type":"INDEX",
    "confId":"new-config",
    "crawlId":"crawl01",
    "args": {}
}

More information about this endpoint could be found here

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM