简体   繁体   English

StormCrawler中的重定向有任何限制吗?

[英]Is there any limit on redirects in StormCrawler?

I can see the _redirTo tag in Status index of ElasticSearch. 我可以在_redirTo的状态索引中看到_redirTo标签。 A few questions regarding redirection as follows : 有关重定向的几个问题如下:

  1. Any limit on redirection ? 重定向有任何限制吗? so that it should not end in loop of redirects ? 这样它就不应该以重定向循环结尾吗?
  2. How many redirects of particular fetched URL ? 特定获取的URL有多少重定向? I can see only one redirect in _redirTo tag which is immediate one. 我只能在_redirTo标记中看到一个重定向,它是立即重定向。 Cannot get count of redirects if there are two or three redirects of URL ? 如果URL有两个或三个重定向,则无法获得重定向计数?

You can set a limit to the depth from the seed, see MaxDepth URL filter but not directly on the number of successive redirections. 您可以设置从种子开始的深度限制,请参见MaxDepth URL过滤器,但不能直接取决于连续重定向的数量。

As you noticed, we track only the URL a given document is redirected to. 如您所见,我们仅跟踪给定文档被重定向到的URL。

If you wanted to control the number of redirs regardless of the distance from the seed, one way would be to extend or modify MetadataTransfer or handle the redirs within the protocol implementation, the downside being that this will not check whether the target URL has already been fetched. 如果您想要控制重做的次数而不管与种子的距离如何,一种方法是扩展或修改MetadataTransfer或在协议实现中处理重做,其缺点是这将不会检查目标URL是否已经被拿来。

UPDATE There is a config element called 'redirections.allowed' with a default value of true. 更新有一个名为'redirections.allowed'的配置元素,默认值为true。 I've just pushed a fix for SimpleFetcherBolt as it wasn't handled properly. 我刚刚推送了针对SimpleFetcherBolt的修复程序,因为处理不正确。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM