簡體   English   中英

將Solr與Nutch問題集成

[英]Integrating Solr with Nutch issue

我正在從這里開始學習教程。 我已經分別安裝了solr和nutch,它們都工作正常。 當我必須集成它們時,問題就來了。 從該站點上的早期帖子中,我了解到架構文件可能存在一些問題。 如在tut中提到的,我將nutch的schema.xml復制到solr的schema.xml並重新啟動solr。 solr由於配置問題而停止。 因此,我只是將每個文件的內容與現有內容一起復制到另一個文件中。 現在(以及之前),我得到此錯誤:

Indexer: starting at 2014-08-05 11:10:21
Indexer: deleting gone documents: false
Indexer: URL filtering: false
Indexer: URL normalizing: false
Active IndexWriters :
SOLRIndexWriter
        solr.server.url : URL of the SOLR instance (mandatory)
        solr.commit.size : buffer size when sending to SOLR (default 1000)
        solr.mapping.file : name of the mapping file for fields (default solrindex-mapping.xml)
        solr.auth : use authentication (default false)
        solr.auth.username : use authentication (default false)
        solr.auth : username for authentication
        solr.auth.password : password for authentication


Indexer: java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
        at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:114)
        at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186)

有人可以建議應該怎么做嗎? 我正在使用apache-nutch-1.8和solr-4.9.0,這是我的hadoop.log文件的樣子:

2014-08-05 12:50:05,032 INFO  crawl.Injector - Injector: starting at 2014-08-05 12:50:05
2014-08-05 12:50:05,033 INFO  crawl.Injector - Injector: crawlDb: -dir/crawldb
2014-08-05 12:50:05,033 INFO  crawl.Injector - Injector: urlDir: urls
.
.
.
.
.
2014-08-05 13:04:21,255 INFO  solr.SolrIndexWriter - Indexing 1 documents
2014-08-05 13:04:21,286 WARN  mapred.LocalJobRunner - job_local1310160376_0001
org.apache.solr.common.SolrException: Bad Request

Bad Request

request: http://my-solr-url:8983/solr/update?wt=javabin&version=2
    at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430)
    at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
    at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
    at org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:155)
    at org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:118)
    at org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:44)
    at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.close(ReduceTask.java:467)
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:535)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:421)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
2014-08-05 13:04:21,544 ERROR indexer.IndexingJob - Indexer: java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
    at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:114)
    at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186)

2014-08-05 13:10:37,855 INFO  crawl.Injector - Injector: starting at 2014-08-05 13:10:37
.
.
.

可能由於某些版本差異,本教程建議復制conf / schema.xml,而在此特定版本的solr中,應該復制文件schema-solr4.xml,然后添加: <field name="_version_" type="long" indexed="true" stored="true"/>在第351行<field name="_version_" type="long" indexed="true" stored="true"/> 。通過java -jar start.jar重新啟動solr,一切正常! 希望這對某人有幫助!

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM