I am using Apache Nutch to crawl the web page. I want to crawl the web page when i search for particular name like if i search bill gates i want to get the results links of that search result. I have url like
www.mysite.com/search?name=bill+gates
but in crawling it displays no more url to fetch. actually it does not fetch any results.
Is there any option to crawl that page? i have added in regex-urlfilter.txt to accept everything. How would i crawl the link? Thanks in advance.
In my memory nutch got an extra setting for cutting off url parameters like ?q=bill+gates. I'll think this setting is located in automaton-urlfilter.txt:
# skip URLs containing certain characters as probable queries, etc.
-.*[?*!@=].*
So you got to change this line.
Hope I could help you
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.