简体   繁体   中英

Displaying NON-ASCII Characters using HttpClient

So, i am using this code to get the whole HTML of a website. But i dont seem to get non-ascii characters with me. all i get is diamonds with question mark.
characters like this: å, appears like this:
I doubt its because of the charset, what could it then be?

Log.e("HTML", "henter htmlen..");
            String url = "http://beep.tv2.dk";
            HttpClient client = new DefaultHttpClient();
            client.getParams().setParameter(CoreProtocolPNames.PROTOCOL_VERSION, 
                    HttpVersion.HTTP_1_1);
            client.getParams().setParameter(CoreProtocolPNames.HTTP_ELEMENT_CHARSET, "UTF-8");
            HttpGet request = new HttpGet(url);
            HttpResponse response = client.execute(request);
            Header h = HeaderValueFormatter
            response.addHeader(header)
            String html = "";
            InputStream in = response.getEntity().getContent();
            BufferedReader reader = new BufferedReader(new InputStreamReader(in));
            StringBuilder str = new StringBuilder();
            String line = null;
            while((line = reader.readLine()) != null)
            {
                str.append(line);
            }
            in.close();
        //b = false;
        html = str.toString();

Thank you. This worked (in case others have the issue):

HttpClient client = new DefaultHttpClient();
    client.getParams().setParameter(CoreProtocolPNames.PROTOCOL_VERSION, 
         HttpVersion.HTTP_1_1);
    client.getParams().setParameter(CoreProtocolPNames.HTTP_ELEMENT_CHARSET, "iso-8859-1");
    HttpGet request = new HttpGet(url);
    request.setHeader("Accept-Charset", "iso-8859-1, unicode-1-1;q=0.8");
    HttpResponse response = client.execute(request);
    String html = "";
    InputStream in = response.getEntity().getContent();
    BufferedReader reader = new BufferedReader(new InputStreamReader(in,"iso-8859-1"));
  1. use the new InputStreamReader(in, "UTF-8") constructor
  2. Set the Accept-Charset request header to, say, Accept-Charset: iso-8859-5, unicode-1-1;q=0.8
  3. Make sure the page opens properly in a browser. If it does not, then it might be a server-side issue.
  4. If none of the above works, check other headers using firebug (or similar tool)

This really helped me get started, but I was having the same problem while reading a text file. It was fixed using the following command:

    BufferedReader br = new BufferedReader(new InputStreamReader(new 
                FileInputStream(fileName), "iso-8859-1"));

...and of course, the HTTP Response needs to have the encoding set as well:

    response.setCharacterEncoding("UTF-8");

Thanks for the help!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM