[英]nutch fetch is failing with java.lang.NumberFormatException
I am running Nutch 1.18 on Red Hat Enterprise Linux release 8.3 (Ootpa) w/ Java openjdk version "1.8.0_275"我在 Red Hat Enterprise Linux 版本 8.3 (Ootpa) 上运行 Nutch 1.18,带有 Java openjdk 版本“1.8.0_275”
I am following these directions: https://cwiki.apache.org/confluence/display/NUTCH/NutchTutorial#NutchTutorial-Step-by-Step:Concepts我遵循这些方向: https://cwiki.apache.org/confluence/display/NUTCH/NutchTutorial#NutchTutorial-Step-by-Step:Concepts
When I get to the step for bin/nutch fetch $s1
every fetch is failing.当我到达
bin/nutch fetch $s1
的步骤时,每次获取都失败了。 See a sample error from the hadoop log below.请参阅下面的 hadoop 日志中的示例错误。 They all fail with java.lang.NumberFormatException.
它们都因 java.lang.NumberFormatException 而失败。 I can use curl to check that the urls are accessible and they are.
我可以使用 curl 来检查 URL 是否可以访问,并且它们可以访问。
Any advice would be appreciated.任何意见,将不胜感激。
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:583)
at java.lang.Integer.parseInt(Integer.java:615)
at org.apache.hadoop.conf.Configuration.getInt(Configuration.java:1486)
at org.apache.nutch.protocol.http.api.HttpBase.setConf(HttpBase.java:212)
at org.apache.nutch.protocol.http.Http.setConf(Http.java:52)
at org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:169)
at org.apache.nutch.protocol.ProtocolFactory.getProtocolInstanceByExtension(ProtocolFactory.java:177)
at org.apache.nutch.protocol.ProtocolFactory.getProtocol(ProtocolFactory.java:155)
at org.apache.nutch.fetcher.FetcherThread.run(FetcherThread.java:308)```
The stack (keywords: protocol, http, configuration, parseInt) already tells that some integer value of a configuration property failed to read.堆栈(关键字:协议、http、配置、parseInt)已经告诉我们无法读取配置属性的某些 integer 值。 When looking into the source code (HttpBase.java, line 212) it becomes clear that it's about the configuration property "http.timeout":
查看源代码(HttpBase.java,第 212 行)时,很明显它与配置属性“http.timeout”有关:
<property>
<name>http.timeout</name>
<value>10000</value>
<description>The default network timeout, in milliseconds.</description>
</property>
Please verify that it is configured correctly - an integer value and a reasonable time span.请验证它是否配置正确 - integer 值和合理的时间跨度。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.