简体   繁体   中英

Jsoup connect error 403 and 503

I've been using jsoup connect method for getting DOM of certain websites for some time (made my personal bot and I make 20-30 request per day to those websites). Namely I can open and browse that website but my java program can't access it since today, one thing I noticed changed is that CloudFlare is checking my browser (prevention of DDoS attacks) . My connect code looks like this

doc = Jsoup.connect(url)
                .userAgent("Mozilla/5.0 (Windows; U; WindowsNT 5.1; en-US; rv1.8.1.6) Gecko/20070725 Firefox/2.0.0.6")
                .referrer("http://www.google.com")
                .timeout(0)
                .get();

and now I get error 503. I tried changing userAgent to only "Mozzila/5.0" and than I get error 403. Doesn't make any sense to my, but my suspicion is on the Cloudflare system.

Edit:

I discovered that CloudFlare protection "I'am under attack" requires for browser to have JavaScript and Cookies on and grants access to website after 5 seconds. How can I recreate that situation with my Java program?

Every website has its limitation to avoid crash or attack. It happens to me when I want to access github data. I did not see any authentication in your code (you may hide it, which I can understand). Sometimes they will give you higher access limitation with higher frequency. So try give authentication is good.

Another problem is that you set timeout to 0. ConnectionTimeout=0 is bad, make it something reasonable like 30 seconds.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM