简体   繁体   中英

Replace non english character in a string with utf-8 character in Android / Java

I need to replace some non English characters into \\u00 format.

Like: BetalingsMåde, so the questionable character is å which needs to be converted to \å

I've tried everything even

updateRequest=updateRequest.replaceAll("[^\\p{ASCII}]", "");

but this only removes the non English characters.

Also I need to send this request through POST with HTTP request (also tried

setRequestProperty("content-type","application/json;charset=utf-8");

with no luck, so if there is an answer with that also.

Thanks in advance!

If you want to convert to a unicode escaped string you can do this:

org.apache.commons.lang3.StringEscapeUtils.escapeJava("Your string to escape");

It's part of the Apache Commons Lang 3 Package .

In java String/char already contains Unicode text. However some things could have gone wrong. Having a messed up String always means the point of entry has to be corrected.

Hard coded strings in java source code need the same encoding for compiler and editor. Nowadays I would set the IDE's encoding hard to UTF-8.

Properties files are by default restricted ISO-8859-1, meaning one should use \\uXXXX .

Files being read must be read with the encoding of the file specified. Often there is an overloaded method without encoding. And the old FileReader/FileWriter should not be used, they use the current platform encoding - not portable.

Texts from the database are merely problematic, if the database was wrongly defined, or the JDBC driver can communicate with another encoding.

I am not sure you want the following, which does a bit what the java2ascii tool does.

String toAscii(String s) {
    StringBuilder sb = new StringBuilder(s.length() * 9 / 10);

    for (int i = 0; i < s.length(); ++i) {
        int ch = (int) s.charAt(i);
        if (0 < ch && ch < 128) {
            sb.append(ch);
        } else {
            sb.append(String.format("\\u%04x", ch));
        }
    }
    return sb.toString();
}

More likely use setRequestProperty("content-type","text/json;charset=utf-8"); so the charset is indeed used (text). Or even more likely on the response , not the request.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM