简体   繁体   English

Eclipse中的Nutch错误

[英]Nutch error in Eclipse

I'm trying to run Apache Nutch from Eclipse . 我正在尝试从Eclipse运行Apache Nutch I followed the instructions at http://wiki.apache.org/nutch/RunNutchInEclipse . 我按照http://wiki.apache.org/nutch/RunNutchInEclipse上的说明进行操作。 However, sources of "parse-html" (both java and test) has errors. 但是,“ parse-html”的源(java和test)都有错误。 I run it anyway, it reads and fetches URL's from the seed.txt and returns this error: 无论如何我都运行它,它从seed.txt中读取并获取URL,并返回此错误:

Fetcher: finished at 2012-03-31 17:21:56, elapsed: 00:00:07
ParseSegment: starting at 2012-03-31 17:21:56
ParseSegment: segment: crawl/segments/20120331172142
Exception in thread "main" java.io.IOException: Job failed!

I would like to point out that my goal is to get indexes from Nutch and store them in MongoDB . 我想指出的是,我的目标是从Nutch获取索引并将其存储在MongoDB中

Add the following to ivy.xml : 将以下内容添加到ivy.xml

<dependency org="rome" name="rome" rev="0.9" />
<dependency org="net.sourceforge.nekohtml" name="nekohtml" rev="1.9.13" />
<dependency org="org.ccil.cowan.tagsoup" name="tagsoup" rev="1.2.1" />

I ran into the same problem. 我遇到了同样的问题。 Here are two ways that might help: 以下两种方法可能会有所帮助:

  • Modify conf/log4j.properties file to report DEBUG messages; 修改conf / log4j.properties文件以报告DEBUG消息;
  • read the hadoop.log file which is usually located in $NUTCH_HOME or $NUTCH_HOME/logs. 读取hadoop.log文件,该文件通常位于$ NUTCH_HOME或$ NUTCH_HOME / logs中。

By examining these messages, you should be able to spot the problem. 通过检查这些消息,您应该能够发现问题。

Here is a tutorial on Running Nutch in Eclipse which also talks about several error handling. 这是有关在Eclipse中运行Nutch的教程,还讨论了一些错误处理。

I found 3 jars and added them to the project as external jars and it worked. 我找到了3个jar,并将它们作为外部jar添加到项目中,并且可以正常工作。 Those jars are : cyberneko.jar , rome-0.9.jar and tagsoup-1.2.jar and you can find all by a simple google search. 这些罐子是: cyberneko.jarrome-0.9.jartagsoup-1.2.jar ,您可以通过简单的Google搜索找到所有这些罐子。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM