简体   繁体   中英

How to convert a string UTF-8 to ANSI in java?

I have a string in UTF-8 format. I want to convert it to clean ANSI format. How to do that?

You can do something like this:

new String("your utf8 string".getBytes(Charset.forName("utf-8")));

in this format 4 bytes of UTF8 converts to 8 bytes of ANSI

Converting UTF-8 to ANSI is not possible generally, because ANSI only has 128 characters (7 bits) and UTF-8 has up to 4 bytes. That's like converting long to int, you lose information in most cases.

You could use a java function like this one here to convert from UTF-8 to ISO_8859_1 (which seems to be a subset of ANSI):

private static String convertFromUtf8ToIso(String s1) {
    if(s1 == null) {
        return null;
    }
    String s = new String(s1.getBytes(StandardCharsets.UTF_8));
    byte[] b = s.getBytes(StandardCharsets.ISO_8859_1);
    return new String(b, StandardCharsets.ISO_8859_1);
}

Here is a simple test:

String s1 = "your utf8 stringáçﬠ";
String res = convertFromUtf8ToIso(s1);
System.out.println(res);

This prints out:

your utf8 stringáç?

The character gets lost because it cannot be represented with ISO_8859_1 (it has 3 bytes when encoded in UTF-8). ISO_8859_1 can represent á and ç .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM