繁体 English 中英

全新安装 nutch 和 solr 爬网错误后

[英]after fresh installation of nutch and solr crawl error

原文 2023-01-06 11:30:19 9 1 solr/ nutch

全新安装 nutch 1.19 和 solr 8.11.2 后出现问题。 运行爬网过程后，爬网结束并出现 NullPointerException 和以下错误消息：

运行错误：/opt/solr/apache-nutch-1.19/bin/nutch fetch -Dsolr.server.url=http//localhost:8983/solr/nutch -Dmapreduce.job.reduces=2 -Dmapreduce.reduce.speculative= false -Dmapreduce.map.speculative=false -Dmapreduce.map.output.compress=true -D fetcher.timelimit.mins=180 crawl/segments/20230106121647 -threads 50 失败，退出值为 255。

有谁知道是什么导致了这个错误？

1 个解决方案

错误消息表明 memory（Java 堆）不足以启动 50 个提取程序线程。 您可以尝试以下操作：

如果不需要默认数量的 50 个提取线程，请通过将选项--num-threads n_threads给 bin/crawl 来减少它
Java 堆大小可以通过环境变量NUTCH_HEAPSIZE设置——默认值为 4 MB，即使有 50 个线程也应该足够，除非你有非常大的文档（例如 PDF 文件）来解析和索引。
您的系统可能存在限制，需要使用较少的 memory 或线程

单抓取脚本抓取网站（Nutch）和索引结果（Solr）

[英]Single Crawl script to Crawl website (Nutch) and Index results (Solr)

使用nutch抓取图像及其元数据并将其索引到solr中

[英]Crawl image and their metadata using nutch and index them into solr

我们可以使用nutch和solr抓取和索引Google云端硬盘文档吗？

[英]Can we crawl and index Google Drive documents using nutch and solr?

我可以使用Nutch爬网，存储在Cassandra中，使用Solr进行索引吗？

[英]Can I crawl with Nutch, store in Cassandra, index using Solr?

Nutch爬网后，Solr索引失败，报告“作业失败”

[英]Solr indexing following a Nutch crawl fails, reports “Job Failed”

将Nutch正则表达式文件分离以爬网并索引到多个Solr核心

[英]Separate Nutch regex files to crawl and index to multiple Solr cores

在使用nutch和solr进行爬网或索引时从html中删除菜单

[英]Removing menu's from html during crawl or indexing with nutch and solr

带有Solr 3.4的Nutch 1.4-无法抓取网址，“没有要提取的网址”

[英]Nutch 1.4 with Solr 3.4 - can't crawl URL, “no URLs to fetch”

如何使用Apache Nutch和Solr搜寻磁链，以便它们在Solr查询结果中可用？

[英]How to crawl magnet links with Apache Nutch and Solr so that they're available in Solr query results?

在执行solrindex命令后，Solr索引为空

[英]Solr index empty after nutch solrindex command

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 单抓取脚本抓取网站（Nutch）和索引结果（Solr）使用nutch抓取图像及其元数据并将其索引到solr中我们可以使用nutch和solr抓取和索引Google云端硬盘文档吗？我可以使用Nutch爬网，存储在Cassandra中，使用Solr进行索引吗？ Nutch爬网后，Solr索引失败，报告“作业失败” 将Nutch正则表达式文件分离以爬网并索引到多个Solr核心在使用nutch和solr进行爬网或索引时从html中删除菜单带有Solr 3.4的Nutch 1.4-无法抓取网址，“没有要提取的网址” 如何使用Apache Nutch和Solr搜寻磁链，以便它们在Solr查询结果中可用？在执行solrindex命令后，Solr索引为空

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM