简体   繁体   English

在 Storm Crawler 中抓取特定基本 URL 的所有子 URL 的完成事件

[英]completion event of crawling all of the sub URLs for specific base URL in Storm Crawler

I am currently working on Storm Crawler based project.我目前正在研究基于 Storm Crawler 的项目。 I need to do some processing after the completion event of the crawling of all sub URLs for that base URL.我需要在抓取该基本 URL 的所有子 URL 的完成事件之后进行一些处理。 For example, I want to change a status when all of the discovered URLs for that domain crawled successfully or with an error.例如,当该域的所有已发现 URL 均已成功爬网或出现错误时,我想更改状态。 How can I find a finishing event for each Base URL?如何找到每个基本 URL 的完成事件?

Not out of the box, no.不是开箱即用的,不是。 you would have to implement a mechanism to check whether there are unfetched URLs left for a given key yourself.您必须实现一种机制来检查自己是否为给定的键保留了未提取的 URL。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM