简体   繁体   English

在流量搜寻器中禁用子域

[英]Disable subdomain in flow stormcrawler

How we can disable inject sub domain in streaming? 如何在流媒体中禁用注入子域? Now, if we inject www.ebay.com in stream than in out we have subdomain pages: my.ebay.com , community.ebay.com , ... 现在,如果我们在流中注入www.ebay.com而不是在其中注入,我们将拥有子域页面: my.ebay.comcommunity.ebay.com ,...

You can configure HostURLFilter to exclude URLs which are outside the seeds hostnames, by setting ignoreOutsideHost to true in urlfilters.json 您可以配置HostURLFilter要排除的是种子的主机名之外的URL,在urlfilters.json ignoreOutsideHost设置为true

{
  "class": "com.digitalpebble.stormcrawler.filtering.host.HostURLFilter",
  "name": "HostURLFilter",
  "params": {
    "ignoreOutsideHost": true,
    "ignoreOutsideDomain": true
  }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM