简体   繁体   中英

Nutch - Are the -depth and -topN still available in 1.6

I want to know if the parameters -depth & -topN are still available nutch 1.6.
I dont even know what's the difference between these parameters and the limit parameter in /bin/crawl bash script?

For the description :-

  • depth depth indicates the link depth from the root page that should be crawled.
    eg you can have links in you root page scan which in turn would have links in it and so on. This may lead to exponential scanning of links. The depth param restricts the hierarchy of links that would be scanned from the root page.

  • topN N determines the maximum number of pages that will be retrieved at each level up to the depth.
    eg You may have 100 links on the root page. topN would limit the number of links to be scanned on each level.

So basically the Number of links max that should be scanned would be Root Page * Depth * topN

Also, Don't see in the documentation that they have been removed or deprecated. So I assume they are available.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM