簡體 English 中英

在 StormCrawler 中獲取圖像而不在狀態中索引它們

[英]fetching image in StormCrawler without indexing them in status

原文 2021-06-23 04:00:57 5 1 web-crawler/ stormcrawler

我想下載網頁中的所有圖像，並將它們提供給一些機器學習算法，以便對這些圖像中的對象進行分類和提取。 我不想在狀態集合中索引它們，但我想在 JsoupParser bolt 中提取它們，省略它們的地址並在拓撲中下載它們並將它們提供給一些計算機視覺算法。 在 StormCrawler 中可能嗎？

1 個解決方案

如果你想在拓撲中獲取它們，它們需要在狀態索引中。 它們顯然不需要在內容索引中，因為沒有要查詢的文本內容； 你需要編寫一個自定義的 bolt 來將圖像的內容保存到你想要的任何形式的存儲中。 例如，如果您在 EC2 上運行爬網，那么 AWS S3 將是一個不錯的選擇。

使用 StormCrawler 絕對可行，事實上有幾家公司為此目的使用它。

Stormcrawler不會為Elasticsearch提取/索引頁面

[英]Stormcrawler not fetching/indexing pages for elasticsearch

Stormcrawler沒有使用Elasticsearch索引內容

[英]Stormcrawler not indexing content with Elasticsearch

Stormcrawler，狀態索引和重新爬行

[英]Stormcrawler, the status index and re-crawling

Stormcrawler-es.status.filterQuery如何工作？

[英]Stormcrawler - how does the es.status.filterQuery work?

Stormcrawler：在不重啟拓撲的情況下注入新的 URL 進行爬取

[英]Stormcrawler: Injecting new URL to crawl without restarting the topology

StormCrawler設置

[英]StormCrawler settings

StormCrawler maven 打包錯誤

[英]StormCrawler maven packaging error

在流量搜尋器中禁用子域

[英]Disable subdomain in flow stormcrawler

如何在Nut 2.1中抓取頁面但不獲取視頻/圖像內容？

[英]How can i crawl page but without fetching video/image content in nutch 2.1?

StormCrawler中的重定向有任何限制嗎？

[英]Is there any limit on redirects in StormCrawler?

暫無

暫無

聲明:本站的技術帖子網頁，遵循CC BY-SA 4.0協議，如果您需要轉載，請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

相關問題 Stormcrawler不會為Elasticsearch提取/索引頁面 Stormcrawler沒有使用Elasticsearch索引內容 Stormcrawler，狀態索引和重新爬行 Stormcrawler-es.status.filterQuery如何工作？ Stormcrawler：在不重啟拓撲的情況下注入新的 URL 進行爬取 StormCrawler設置 StormCrawler maven 打包錯誤在流量搜尋器中禁用子域如何在Nut 2.1中抓取頁面但不獲取視頻/圖像內容？ StormCrawler中的重定向有任何限制嗎？

相關標簽

粵ICP備18138465號 © 2020-2024 STACKOOM.COM