简体   繁体   中英

Nutch does not crawl all links in form

I have problem to crawling my site...there is a form with two drop-down lists....and when I start crawl , the crawler fetch only part of links from form....from first drop-down list it takes part of options, as from second drop-down....I try change some configurations in nutch-defaults.xml file, but everything is the same...

I change 
fetcher.threads.per.queue  1 - 10         
db.ignore.internal.links true - false  
db.ignore.external.links false - true  
http.content.limit    65536 - 65536000  
file.content.limit    65536 - 65536000  
db.update.max.inlinks  10.000 - 100.000

is there any other option, that can help me to crawl all options in my form......?? Thanks for answers.

Sorry, too low rep to post comment!!!

Have you got a link.

Also are the drop downs ajax or something fancy. Nutch from memory will only crawl what is on the page. Ie if you load the first 10 on page load and the only load the rest with a service when the user scrolls I believe it can't find that.

Some more info would be good re the page....

Cheers Robin

thanks for your answer. This is the [link] (auto.am/en), after crawl I have only around 100 makes and not all models from car makes that I have. ... I hope that after you have got a link you will suggest the solution to crawl all cars makes and models :). Thanks.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM