简体   繁体   中英

Converting string from UTF-8 to ANSI and displaying it as UTF-8

I want to mimic with Java one thing I can do with Notepad++ .

TEXT_2 = convert(TEXT_1) // where: TEXT_2 = "Български", TEXT_1 = "БългарÑки"

How to do it with Notepad++

Setting the starting point...

Open Notepad++ and click: Encoding / Encode in UTF-8 , then paste TEXT_1 :

БългарÑки

Getting TEXT_2

Click: Encoding / Convert to ANSI , then click: Encoding / Encode in UTF-8 . Done.

How to do it with Java

So far I have the following function (which works partially):

public static String convert(String text) {
    String output = new String(Charset.forName("Cp1252").encode(text).array(), Charset.forName("UTF8"));
    return output;
}
System.out.println(convert("БългарÑки"));

With this function I get:

Българ�?ки // where correct is slightly different: Български

any idea to make it work?.

If possible, could you provide the code that would work inside the function convert() ?. Thanks.

Here's a solution that avoids performing the Charset lookup for every conversion:

import java.nio.charset.Charset;

private final Charset UTF8_CHARSET = Charset.forName("UTF-8");

String decodeUTF8(byte[] bytes) {
    return new String(bytes, UTF8_CHARSET);
}

byte[] encodeUTF8(String string) {
    return string.getBytes(UTF8_CHARSET);
}

second approach :

Convert from String to byte[]:

String s = "some text here";
byte[] b = s.getBytes("UTF-8");

Convert from byte[] to String:

byte[] b = {(byte) 99, (byte)97, (byte)116};
String s = new String(b, "US-ASCII");

You should, of course, use the correct encoding name. My examples used "US-ASCII" and "UTF-8", the two most common encodings.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM