简体   繁体   English

Nutch--depth和-topN在1.6中仍然可用

[英]Nutch - Are the -depth and -topN still available in 1.6

I want to know if the parameters -depth & -topN are still available nutch 1.6. 我想知道参数-depth-topN是否仍然可用1.6。
I dont even know what's the difference between these parameters and the limit parameter in /bin/crawl bash script? 我什至不知道这些参数与/ bin / crawl bash脚本中的limit参数有什么区别?

For the description :- 对于描述:-

  • depth depth indicates the link depth from the root page that should be crawled. depth depth指示应从根页面开始的链接深度。
    eg you can have links in you root page scan which in turn would have links in it and so on. 例如,您可以在根页扫描中包含链接,而链接中将包含链接,依此类推。 This may lead to exponential scanning of links. 这可能导致链接的指数扫描。 The depth param restricts the hierarchy of links that would be scanned from the root page. 深度参数限制了将从根页面扫描的链接的层次结构。

  • topN N determines the maximum number of pages that will be retrieved at each level up to the depth. topN N确定在直至深度的每个级别将检索的最大页面数。
    eg You may have 100 links on the root page. 例如,您在根页面上可能有100个链接。 topN would limit the number of links to be scanned on each level. topN将限制每个级别上要扫描的链接数。

So basically the Number of links max that should be scanned would be Root Page * Depth * topN 因此,基本上应扫描的最大链接数将是“根页数*深度* topN”

Also, Don't see in the documentation that they have been removed or deprecated. 另外,在文档中看不到它们已被删除或弃用。 So I assume they are available. 因此,我认为它们可用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM