简体   繁体   中英

Jsoup connect(): bypass google captcha

I make a small application and I have to retrieve the URL based on keywords. This is the code:

  Elements doc = Jsoup
          "Mozilla 5.0 (Windows NT 6.1)")

        for (Element link : doc) {

              String url = link.absUrl("href"); 
            try {
              url = URLDecoder.decode(url.substring(url.indexOf('=') + 1, url.indexOf('&')), "UTF-8");
            } catch (UnsupportedEncodingException e) {
                    // TODO Auto-generated catch block

                continue; // Ads/news/etc.
            else if(url.contains("/pdf/"))
            else if(url.contains("//github.com/"))


just get the following error:

org.jsoup.HttpStatusException: HTTP error fetching URL. Status=503, URL=http://ipv4.google.com/sorry/IndexRedirect?continue=http://www.google.com/search%3Flr%3Dlang_en....
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:435)
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:446)
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:410)
at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:164)
at org.jsoup.helper.HttpConnection.get(HttpConnection.java:153)
at sperimentazioni.Main.getDataFromGoogle(Main.java:327)
at sperimentazioni.Main.getURLs(Main.java:164)
at sperimentazioni.Main.main(Main.java:485)

Apparently it is the captcha google, how can I bypass?

The following logic works for me:

Document doc =
         .userAgent("Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)")

Elements links = doc.select("a[href]");
for (Element link : links) {

    String temp = link.attr("href");
    if (temp.startsWith("/url?q=")) 


You cannot bypass it, however you can use 3rd party services for CPATCHA recognition and post proper answer. Check DeatchByCaptcha.com


The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM