My requirement is to make a report on a given keyword by searching that keyword online.
My plan is that my webcrawler will
As I want to make a rule obeying webcrawler. So when I see the robots.txt
of these websites I come to know that search engines have blocked the webcrawler to search keywords like
google.com/robots.txt
User-agent: *
Disallow: /search
I know that if I try to search keyword on the search engines my ip might be blocked.
My new plan that my webcrawler will
Questions
PS: I am using Java and Jsoup for webcrawling
尝试selenium ,做您的工作。它用于自动化,所以我认为您的IP不会受到任何服务提供商的阻碍。
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.