简体   繁体   中英

Server returned HTTP response code: 406 a URL

I am writing a web crawler using Java and HttpURLConnection and this is the error I get:

java.io.IOException: Server returned HTTP response code: 406 for URL: https://www.mkyong.com/kotlin/kotlin-how-to-loop-a-map/
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(Unknown Source)
at testing.HttpURLConnectionGo.sendGet(HttpURLConnectionGo.java:34)
at testing.DefinitelyNotSpiderLeg.crawl(DefinitelyNotSpiderLeg.java:55)
at testing.DefinitelyNotSpider.search(DefinitelyNotSpider.java:33)
at testing.Test.main(Test.java:9)

and this is the method I use for the connection:

// HTTP GET request
public String sendGet(String url) throws Exception {

    URL obj = new URL(url);
    HttpURLConnection con = (HttpURLConnection) obj.openConnection();

    // optional default is GET
    con.setRequestMethod("GET");

    //add request header
    con.setRequestProperty("User-Agent", USER_AGENT);

    BufferedReader in = new BufferedReader(
            new InputStreamReader(con.getInputStream()));
    String inputLine;
    StringBuffer response = new StringBuffer();

    while ((inputLine = in.readLine()) != null) {
        response.append(inputLine);
    }
    in.close();
    return response.toString();
}

Then I use Jsoup to get the String in another class:

String html = http.sendGet(url);
Document doc = Jsoup.parse(html);

Why do I get this error?

HTTP 406 is a status of "Not Acceptable" per HTTP.CAT and Mozilla . The Mozilla reference goes on to say that the error is rare and usually means that

indicates that a response matching the list of acceptable values defined in Accept-Charset and Accept-Language cannot be served.

You might try setting those headers in your request.

It could also be possible that the URL you're hitting has some bot or crawler detection logic and a 406 is being returned because that behavior is "Not Acceptable". That use case is not the ideal error code but it makes sense.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM