简体   繁体   中英

Read non-english characters from http get request

I have a problem in getting Hebrew characters from a http get request.

I'm getting squares characters like this: "[]" instead of the Hebrew characters.

The English characters are Ok.

This is my function:

public String executeHttpGet(String urlString) throws Exception {
    BufferedReader in = null;
    try {
        HttpClient client = new DefaultHttpClient();
        HttpGet request = new HttpGet();
        request.setURI(new URI(urlString));
        HttpResponse response = client.execute(request);
        in = new BufferedReader(new InputStreamReader(response.getEntity().getContent(),"UTF-8"));
        StringBuffer sb = new StringBuffer("");
        String line = "";
        String NL = System.getProperty("line.separator");
        while ((line = in.readLine()) != null) {
            sb.append(line + NL);
        }
        in.close();
        String page = sb.toString();
        // System.out.println(page);
        return page;
    } finally {
        if (in != null) {
            try {
                in.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }
}

You can test is by this example url:

String str = executeHttpGet("http://kavim-t.co.il/include/getXMLStations.asp?parent=7_%20_1");

Thank you!

The file you linked to doesn't seem to be UTF-8 . I tested that it opens correctly using WINDOWS-1255 (hebrew encoding), you should try that instead of UTF-8 .

Try a different website, it looks like it doesn't use UTF-8. Alternatively, UTF-16 may work but I haven't tried. Your code looks fine.

As others have pointed out, the content is not actually encoded as UTF-8. You might want to look at httpEntity.getContentType() to extract the actual encoding of the content, and then pass this to your InputStreamReader . This means your code will then be able to cope correctly with any encoding.

hi as is posted in this other question Special characters in PHP / MySQL

you can set the characters on the php file on the example they set utf-8, but you can set a different type that supports the chararcters you need.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM