简体   繁体   English

尽管检查了状态代码,但jsoup仍抛出204状态

[英]jsoup throws 204 status despite a status code check

While i connect to a url through jsoup. 虽然我通过jsoup连接到url。 Here is the snippet of my code: 这是我的代码片段:

  for (int j = 0; j < unq_urls.size(); j++) {

      Response response2 = Jsoup.connect(unq_urls.get(j))
             .userAgent("Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.21 (KHTML, like Gecko) Chrome/19.0.1042.0 Safari/535.21")
             .timeout(100*1000)
             .ignoreContentType(true)
             .execute();

      if (response2.statusCode() == 200) {
...}

}

When the connection is executed jsoup throws the following error: 执行连接后,jsoup会引发以下错误:

org.jsoup.HttpStatusException: HTTP error fetching URL. Status=204, URL=https://www.google.com/gen_204?reason=EmptyURL
    at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:459)
    at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:475)
    at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:475)
    at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:434)
    at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:181)
    at cseapiandparsing.CSE_Author_Name_Dis.<init>(CSE_Author_Name_Dis.java:187)
    at cseapiandparsing.CSE_Author_Name_Dis.main(CSE_Author_Name_Dis.java:263)

How can I overcome this? 我该如何克服? I mean i want jsoup to pass another URL if it cannot connect to a specific URL. 我的意思是,如果jsoup无法连接到特定URL,我希望它传递另一个URL。 Related to this jsoup also throws time out error when it takes too much time to connect a URL. 当花费太多时间连接URL时,与此jsoup相关的操作还会引发超时错误。 To this end I have already put .timeout(100*1000) option. 为此,我已经放置了.timeout(100 * 1000)选项。 However, I was wondering is there a way of passing to another URL if the attempt for the current one takes too long? 但是,我想知道如果对当前URL的尝试花费的时间太长,是否可以传递到另一个URL?

Thanks in advance. 提前致谢。

I believe you are looking for a try-catch mechanism here. 我相信您在这里正在寻找一种try-catch机制。

Surround the Jsoup.connect part with a try clause, then in your catch clause handle the exceptions gracefully, which in your case would be continuing to the next loop. try子句包围Jsoup.connect部分,然后在catch子句中优雅地处理异常,在您的情况下,这些异常将继续到下一个循环。

To skip the current one if it takes too long, simply set timeout() value to your desired waiting period, if it passes that period it will throw a timeout exception, which again will be caught by the catch clause. 跳过当前时间太长,只需将timeout()值设置为所需的等待时间,如果超过该时间,则会抛出超时异常,再次由catch子句catch Try the code I posted below: 试试我在下面发布的代码:

for (int j = 0; j < unq_urls.size(); j++) {
  try{
      Response response2 = Jsoup.connect(unq_urls.get(j))
         .userAgent("Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.21 (KHTML, like Gecko) Chrome/19.0.1042.0 Safari/535.21")
         .timeout(100*1000)
         .ignoreContentType(true)
         .execute();
  } catch(Exception e) {
      continue; //continue to the next loop if exception occurs
  }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM