简体   繁体   中英

Set response encoding with HttpClient 3.1

I'm using org.apache.commons.httpclient.HttpClient and need to setup response encoding (for some reason server returns incorrect encoding in Content-Type). My way is to get response as raw bytes and convert to String with desired encoding. I'm wondering if there is some better way to do this (eg. setup HttpClient). Thanks for suggestions.

I don't think there's a better answer using HttpClient 3.x APIs.

The HTTP 1.1 spec says clearly that a client "must" respect the character set specified in the response header, and use ISO-8859-1 if no character set is specified. The HttpClient APIs are designed on the assumption that the programmer wants to conform to the HTTP specs. Obviously, you need to break the rules in the spec so that you can talk to the non-compliant server. Not withstanding, this is not a use-case that the API designers saw a need to support explicitly.

If you were using the HttpClient 4.x, you could write your own ResponseHandler to convert the body into an HttpEntity , ignoring the response message's notional character set.

A few notes:

  1. Server serves data, so it's up to server to serve it in an appropriate format. So response encoding is set by server not client. However, client could suggest to server what format it would like via Accept and Accept-Charset :

     Accept: text/plain Accept-Charset: utf-8 

    However, http servers usually do not convert between formats.

  2. If option 1. does not work, then you should look at the configuration of the server.

  3. When String is sent as raw bytes (and it always is, because this is what networks transmit), there is always the encoding defined. Since server produces this raw bytes, it defines the encoding. So, you can not take raw bytes and use encoding of your choice to create a String. You must use encoding that was used when converted from String to bytes.

Disclaimer: I'm not really knowing HttpClient, only reading the API.

I would use the execute method returning a HttpResponse, then .getEntity().getContent() . This is a pure byte stream, so if you want to ignore the encoding told by the server, you can simply wrap your own InputStreamReader around it.


Okay, looks like I had the wrong version (obviously, there are too much HttpClient classes out there).

But same as before, just located on other classes: the HttpMethod has a getResponseBodyAsStream() method, around which you can now wrap your own InputStreamReader. (Or get the whole array at once, if it is not too big, and convert it to String, as you wrote.)

I think trying to change the response and letting the HttpClient analyze it is not the right way here.


I suggest sending a message to the server administrator/webmaster about the wrong charset, though.

Greetings folks,

Jus in case someone finds this post googling for setting HttpClient to write in UTF-8.

This line of code should be handy...

response.setContentType("text/html; charset=UTF-8");

Best

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM