簡體 English 中英

全新安裝 nutch 和 solr 爬網錯誤后

[英]after fresh installation of nutch and solr crawl error

原文 2023-01-06 11:30:19 1 1 solr/ nutch

全新安裝 nutch 1.19 和 solr 8.11.2 后出現問題。 運行爬網過程后，爬網結束並出現 NullPointerException 和以下錯誤消息：

運行錯誤：/opt/solr/apache-nutch-1.19/bin/nutch fetch -Dsolr.server.url=http//localhost:8983/solr/nutch -Dmapreduce.job.reduces=2 -Dmapreduce.reduce.speculative= false -Dmapreduce.map.speculative=false -Dmapreduce.map.output.compress=true -D fetcher.timelimit.mins=180 crawl/segments/20230106121647 -threads 50 失敗，退出值為 255。

有誰知道是什么導致了這個錯誤？

1 個解決方案

錯誤消息表明 memory（Java 堆）不足以啟動 50 個提取程序線程。 您可以嘗試以下操作：

如果不需要默認數量的 50 個提取線程，請通過將選項--num-threads n_threads給 bin/crawl 來減少它
Java 堆大小可以通過環境變量NUTCH_HEAPSIZE設置——默認值為 4 MB，即使有 50 個線程也應該足夠，除非你有非常大的文檔（例如 PDF 文件）來解析和索引。
您的系統可能存在限制，需要使用較少的 memory 或線程

單抓取腳本抓取網站（Nutch）和索引結果（Solr）

[英]Single Crawl script to Crawl website (Nutch) and Index results (Solr)

使用nutch抓取圖像及其元數據並將其索引到solr中

[英]Crawl image and their metadata using nutch and index them into solr

我們可以使用nutch和solr抓取和索引Google雲端硬盤文檔嗎？

[英]Can we crawl and index Google Drive documents using nutch and solr?

我可以使用Nutch爬網，存儲在Cassandra中，使用Solr進行索引嗎？

[英]Can I crawl with Nutch, store in Cassandra, index using Solr?

Nutch爬網后，Solr索引失敗，報告“作業失敗”

[英]Solr indexing following a Nutch crawl fails, reports “Job Failed”

將Nutch正則表達式文件分離以爬網並索引到多個Solr核心

[英]Separate Nutch regex files to crawl and index to multiple Solr cores

在使用nutch和solr進行爬網或索引時從html中刪除菜單

[英]Removing menu's from html during crawl or indexing with nutch and solr

帶有Solr 3.4的Nutch 1.4-無法抓取網址，“沒有要提取的網址”

[英]Nutch 1.4 with Solr 3.4 - can't crawl URL, “no URLs to fetch”

如何使用Apache Nutch和Solr搜尋磁鏈，以便它們在Solr查詢結果中可用？

[英]How to crawl magnet links with Apache Nutch and Solr so that they're available in Solr query results?

在執行solrindex命令后，Solr索引為空

[英]Solr index empty after nutch solrindex command

暫無

暫無

聲明:本站的技術帖子網頁，遵循CC BY-SA 4.0協議，如果您需要轉載，請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

相關問題 單抓取腳本抓取網站（Nutch）和索引結果（Solr）使用nutch抓取圖像及其元數據並將其索引到solr中我們可以使用nutch和solr抓取和索引Google雲端硬盤文檔嗎？我可以使用Nutch爬網，存儲在Cassandra中，使用Solr進行索引嗎？ Nutch爬網后，Solr索引失敗，報告“作業失敗” 將Nutch正則表達式文件分離以爬網並索引到多個Solr核心在使用nutch和solr進行爬網或索引時從html中刪除菜單帶有Solr 3.4的Nutch 1.4-無法抓取網址，“沒有要提取的網址” 如何使用Apache Nutch和Solr搜尋磁鏈，以便它們在Solr查詢結果中可用？在執行solrindex命令后，Solr索引為空

相關標簽

粵ICP備18138465號 © 2020-2024 STACKOOM.COM