[英]Disable subdomain in flow stormcrawler
How we can disable inject sub domain in streaming? 如何在流媒体中禁用注入子域? Now, if we inject
www.ebay.com
in stream than in out we have subdomain pages: my.ebay.com
, community.ebay.com
, ... 现在,如果我们在流中注入
www.ebay.com
而不是在其中注入,我们将拥有子域页面: my.ebay.com
, community.ebay.com
,...
You can configure HostURLFilter to exclude URLs which are outside the seeds hostnames, by setting ignoreOutsideHost to true in urlfilters.json 您可以配置HostURLFilter要排除的是种子的主机名之外的URL,在urlfilters.json ignoreOutsideHost设置为true
{
"class": "com.digitalpebble.stormcrawler.filtering.host.HostURLFilter",
"name": "HostURLFilter",
"params": {
"ignoreOutsideHost": true,
"ignoreOutsideDomain": true
}
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.