简体   繁体   English

小结生成时出现RuntimeException

[英]RuntimeException when nutch generate

I'm new to nutch. 我是新手。 I have installed nutch 2.3.1 and configure it to use mongodb. 我已经安装了nutch 2.3.1并将其配置为使用mongodb。 The inject operation was successful but when I try to generate it generate an exception (see below). 注入操作成功,但是当我尝试生成它时,会生成一个异常(请参见下文)。 NB : This error is generated with a seed file containing 60K urls. 注意:此错误是由包含60K URL的种子文件生成的。 So I've tried with 100 urls and everything went well. 因此,我尝试了100个网址,一切顺利。

Do you have an idea what is the cause of this error ? 您是否知道此错误的原因是什么? Thanks !!! 谢谢 !!!

    2016-12-30 00:01:48,446 INFO  crawl.GeneratorJob - GeneratorJob: starting at 2016-12-30 00:01:48
2016-12-30 00:01:48,447 INFO  crawl.GeneratorJob - GeneratorJob: Selecting best-scoring urls due for fetch.
2016-12-30 00:01:48,447 INFO  crawl.GeneratorJob - GeneratorJob: starting
2016-12-30 00:01:48,448 INFO  crawl.GeneratorJob - GeneratorJob: filtering: true
2016-12-30 00:01:48,448 INFO  crawl.GeneratorJob - GeneratorJob: normalizing: true
2016-12-30 00:01:48,448 INFO  crawl.GeneratorJob - GeneratorJob: topN: 100000
2016-12-30 00:01:48,816 WARN  util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2016-12-30 00:01:48,857 INFO  crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule
2016-12-30 00:01:48,867 INFO  crawl.AbstractFetchSchedule - defaultInterval=2592000
2016-12-30 00:01:48,867 INFO  crawl.AbstractFetchSchedule - maxInterval=7776000
2016-12-30 00:01:51,568 WARN  conf.Configuration - file:/tmp/hadoop-mehdi/mapred/staging/mehdi1740651658/.staging/job_local1740651658_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.
2016-12-30 00:01:51,573 WARN  conf.Configuration - file:/tmp/hadoop-mehdi/mapred/staging/mehdi1740651658/.staging/job_local1740651658_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.
2016-12-30 00:01:51,753 WARN  conf.Configuration - file:/tmp/hadoop-mehdi/mapred/local/localRunner/mehdi/job_local1740651658_0001/job_local1740651658_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.
2016-12-30 00:01:51,760 WARN  conf.Configuration - file:/tmp/hadoop-mehdi/mapred/local/localRunner/mehdi/job_local1740651658_0001/job_local1740651658_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.
2016-12-30 00:01:52,408 INFO  crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule
2016-12-30 00:01:52,408 INFO  crawl.AbstractFetchSchedule - defaultInterval=2592000
2016-12-30 00:01:52,408 INFO  crawl.AbstractFetchSchedule - maxInterval=7776000
2016-12-30 00:01:52,591 INFO  regex.RegexURLNormalizer - can't find rules for scope 'generate_host_count', using default
2016-12-30 00:02:03,229 ERROR mapreduce.GoraRecordReader - Error reading Gora records: Read operation to server localhost:27017 failed on database nutch
2016-12-30 00:02:04,607 WARN  mapred.LocalJobRunner - job_local1740651658_0001
java.lang.Exception: java.lang.RuntimeException: com.mongodb.MongoException$Network: Read operation to server localhost:27017 failed on database nutch
    at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.RuntimeException: com.mongodb.MongoException$Network: Read operation to server localhost:27017 failed on database nutch
    at org.apache.gora.mapreduce.GoraRecordReader.nextKeyValue(GoraRecordReader.java:122)
    at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:533)
    at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
    at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: com.mongodb.MongoException$Network: Read operation to server localhost:27017 failed on database nutch
    at com.mongodb.DBTCPConnector.innerCall(DBTCPConnector.java:298)
    at com.mongodb.DBTCPConnector.call(DBTCPConnector.java:269)
    at com.mongodb.DBTCPConnector.call(DBTCPConnector.java:235)
    at com.mongodb.QueryResultIterator.getMore(QueryResultIterator.java:145)
    at com.mongodb.QueryResultIterator.hasNext(QueryResultIterator.java:135)
    at com.mongodb.DBCursor._hasNext(DBCursor.java:626)
    at com.mongodb.DBCursor.hasNext(DBCursor.java:657)
    at org.apache.gora.mongodb.query.MongoDBResult.nextInner(MongoDBResult.java:71)
    at org.apache.gora.query.impl.ResultBase.next(ResultBase.java:111)
    at org.apache.gora.mapreduce.GoraRecordReader.nextKeyValue(GoraRecordReader.java:118)
    ... 12 more
Caused by: java.io.EOFException
    at org.bson.io.Bits.readFully(Bits.java:75)
    at org.bson.io.Bits.readFully(Bits.java:50)
    at org.bson.io.Bits.readFully(Bits.java:37)
    at com.mongodb.Response.<init>(Response.java:42)
    at com.mongodb.DBPort$1.execute(DBPort.java:164)
    at com.mongodb.DBPort$1.execute(DBPort.java:158)
    at com.mongodb.DBPort.doOperation(DBPort.java:187)
    at com.mongodb.DBPort.call(DBPort.java:158)
    at com.mongodb.DBTCPConnector.innerCall(DBTCPConnector.java:290)
    ... 21 more
2016-12-30 00:02:04,846 ERROR crawl.GeneratorJob - GeneratorJob: java.lang.RuntimeException: job failed: name=nutch-maven-1.0-SNAPSHOT.jar, jobid=job_local1740651658_0001
    at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:120)
    at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:227)
    at org.apache.nutch.crawl.GeneratorJob.generate(GeneratorJob.java:256)
    at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:322)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.nutch.crawl.GeneratorJob.main(GeneratorJob.java:330)

I figured out that the problem becomes from mongodb version. 我发现问题出在mongodb版本。 Nutch uses mongo-java-driver-2.13.1.jar ad I've installed mongodb 3.4.1. Nutch使用mongo-java-driver-2.13.1.jar广告,我已经安装了mongodb 3.4.1。 So I've installed mongo 2.6.7 and now it works fine. 所以我已经安装了mongo 2.6.7,现在工作正常。 I'll try to update the driver in Nutch and tell you if it works with the new version of mongodb. 我将尝试在Nutch中更新驱动程序,并告诉您该驱动程序是否适用于新版本的mongodb。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM