繁体   English   中英

小结生成时出现RuntimeException

[英]RuntimeException when nutch generate

我是新手。 我已经安装了nutch 2.3.1并将其配置为使用mongodb。 注入操作成功,但是当我尝试生成它时,会生成一个异常(请参见下文)。 注意:此错误是由包含60K URL的种子文件生成的。 因此,我尝试了100个网址,一切顺利。

您是否知道此错误的原因是什么? 谢谢 !!!

    2016-12-30 00:01:48,446 INFO  crawl.GeneratorJob - GeneratorJob: starting at 2016-12-30 00:01:48
2016-12-30 00:01:48,447 INFO  crawl.GeneratorJob - GeneratorJob: Selecting best-scoring urls due for fetch.
2016-12-30 00:01:48,447 INFO  crawl.GeneratorJob - GeneratorJob: starting
2016-12-30 00:01:48,448 INFO  crawl.GeneratorJob - GeneratorJob: filtering: true
2016-12-30 00:01:48,448 INFO  crawl.GeneratorJob - GeneratorJob: normalizing: true
2016-12-30 00:01:48,448 INFO  crawl.GeneratorJob - GeneratorJob: topN: 100000
2016-12-30 00:01:48,816 WARN  util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2016-12-30 00:01:48,857 INFO  crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule
2016-12-30 00:01:48,867 INFO  crawl.AbstractFetchSchedule - defaultInterval=2592000
2016-12-30 00:01:48,867 INFO  crawl.AbstractFetchSchedule - maxInterval=7776000
2016-12-30 00:01:51,568 WARN  conf.Configuration - file:/tmp/hadoop-mehdi/mapred/staging/mehdi1740651658/.staging/job_local1740651658_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.
2016-12-30 00:01:51,573 WARN  conf.Configuration - file:/tmp/hadoop-mehdi/mapred/staging/mehdi1740651658/.staging/job_local1740651658_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.
2016-12-30 00:01:51,753 WARN  conf.Configuration - file:/tmp/hadoop-mehdi/mapred/local/localRunner/mehdi/job_local1740651658_0001/job_local1740651658_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.
2016-12-30 00:01:51,760 WARN  conf.Configuration - file:/tmp/hadoop-mehdi/mapred/local/localRunner/mehdi/job_local1740651658_0001/job_local1740651658_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.
2016-12-30 00:01:52,408 INFO  crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule
2016-12-30 00:01:52,408 INFO  crawl.AbstractFetchSchedule - defaultInterval=2592000
2016-12-30 00:01:52,408 INFO  crawl.AbstractFetchSchedule - maxInterval=7776000
2016-12-30 00:01:52,591 INFO  regex.RegexURLNormalizer - can't find rules for scope 'generate_host_count', using default
2016-12-30 00:02:03,229 ERROR mapreduce.GoraRecordReader - Error reading Gora records: Read operation to server localhost:27017 failed on database nutch
2016-12-30 00:02:04,607 WARN  mapred.LocalJobRunner - job_local1740651658_0001
java.lang.Exception: java.lang.RuntimeException: com.mongodb.MongoException$Network: Read operation to server localhost:27017 failed on database nutch
    at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.RuntimeException: com.mongodb.MongoException$Network: Read operation to server localhost:27017 failed on database nutch
    at org.apache.gora.mapreduce.GoraRecordReader.nextKeyValue(GoraRecordReader.java:122)
    at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:533)
    at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
    at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: com.mongodb.MongoException$Network: Read operation to server localhost:27017 failed on database nutch
    at com.mongodb.DBTCPConnector.innerCall(DBTCPConnector.java:298)
    at com.mongodb.DBTCPConnector.call(DBTCPConnector.java:269)
    at com.mongodb.DBTCPConnector.call(DBTCPConnector.java:235)
    at com.mongodb.QueryResultIterator.getMore(QueryResultIterator.java:145)
    at com.mongodb.QueryResultIterator.hasNext(QueryResultIterator.java:135)
    at com.mongodb.DBCursor._hasNext(DBCursor.java:626)
    at com.mongodb.DBCursor.hasNext(DBCursor.java:657)
    at org.apache.gora.mongodb.query.MongoDBResult.nextInner(MongoDBResult.java:71)
    at org.apache.gora.query.impl.ResultBase.next(ResultBase.java:111)
    at org.apache.gora.mapreduce.GoraRecordReader.nextKeyValue(GoraRecordReader.java:118)
    ... 12 more
Caused by: java.io.EOFException
    at org.bson.io.Bits.readFully(Bits.java:75)
    at org.bson.io.Bits.readFully(Bits.java:50)
    at org.bson.io.Bits.readFully(Bits.java:37)
    at com.mongodb.Response.<init>(Response.java:42)
    at com.mongodb.DBPort$1.execute(DBPort.java:164)
    at com.mongodb.DBPort$1.execute(DBPort.java:158)
    at com.mongodb.DBPort.doOperation(DBPort.java:187)
    at com.mongodb.DBPort.call(DBPort.java:158)
    at com.mongodb.DBTCPConnector.innerCall(DBTCPConnector.java:290)
    ... 21 more
2016-12-30 00:02:04,846 ERROR crawl.GeneratorJob - GeneratorJob: java.lang.RuntimeException: job failed: name=nutch-maven-1.0-SNAPSHOT.jar, jobid=job_local1740651658_0001
    at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:120)
    at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:227)
    at org.apache.nutch.crawl.GeneratorJob.generate(GeneratorJob.java:256)
    at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:322)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.nutch.crawl.GeneratorJob.main(GeneratorJob.java:330)

我发现问题出在mongodb版本。 Nutch使用mongo-java-driver-2.13.1.jar广告,我已经安装了mongodb 3.4.1。 所以我已经安装了mongo 2.6.7,现在工作正常。 我将尝试在Nutch中更新驱动程序,并告诉您该驱动程序是否适用于新版本的mongodb。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM