简体   繁体   English

Jsoup连接错误403和503

[英]Jsoup connect error 403 and 503

I've been using jsoup connect method for getting DOM of certain websites for some time (made my personal bot and I make 20-30 request per day to those websites). 我一直在使用jsoup connect方法来获取某些网站的DOM一段时间(做了我的个人机器人,我每天向这些网站提出20-30个请求)。 Namely I can open and browse that website but my java program can't access it since today, one thing I noticed changed is that CloudFlare is checking my browser (prevention of DDoS attacks) . 即,我可以打开并浏览该网站,但是从今天起我的Java程序无法访问它,我注意到发生的一件变化是CloudFlare正在检查我的浏览器(防止DDoS攻击)。 My connect code looks like this 我的连接代码如下所示

doc = Jsoup.connect(url)
                .userAgent("Mozilla/5.0 (Windows; U; WindowsNT 5.1; en-US; rv1.8.1.6) Gecko/20070725 Firefox/2.0.0.6")
                .referrer("http://www.google.com")
                .timeout(0)
                .get();

and now I get error 503. I tried changing userAgent to only "Mozzila/5.0" and than I get error 403. Doesn't make any sense to my, but my suspicion is on the Cloudflare system. 现在出现错误503。我尝试将userAgent更改为仅“ Mozzila / 5.0”,然后出现错误403。对我来说没有任何意义,但我怀疑是在Cloudflare系统上。

Edit: 编辑:

I discovered that CloudFlare protection "I'am under attack" requires for browser to have JavaScript and Cookies on and grants access to website after 5 seconds. 我发现CloudFlare保护“受到攻击”要求浏览器启用JavaScript和Cookies,并在5秒钟后授予对网站的访问权限。 How can I recreate that situation with my Java program? 如何用Java程序重新创建这种情况?

Every website has its limitation to avoid crash or attack. 每个网站都有其限制,以避免崩溃或攻击。 It happens to me when I want to access github data. 当我想访问github数据时,这发生在我身上。 I did not see any authentication in your code (you may hide it, which I can understand). 我在您的代码中没有看到任何身份验证(您可以隐藏它,据我所知)。 Sometimes they will give you higher access limitation with higher frequency. 有时,它们会以较高的频率为您提供更高的访问限制。 So try give authentication is good. 因此,尝试给予身份验证是好的。

Another problem is that you set timeout to 0. ConnectionTimeout=0 is bad, make it something reasonable like 30 seconds. 另一个问题是您将超时设置为0。ConnectionTimeout = 0不好,将其设置为30秒是合理的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM