在Java中编码可变长度的utf8字节数组

Question

Actually I am in a situation where I need to read a string which is in utf8 format but its chars use variable-length encoding so I have problem encoding them to string and I get weird chars when printing it, the chars seem to be in Korean and the is the code I used but had no result: 实际上，我需要读取一个utf8格式的字符串，但它的字符使用可变长度编码，所以我有问题将它们编码为字符串，打印时我得到奇怪的字符，字符似乎是韩文这是我使用的代码，但没有结果：

public static String byteToUTF8(byte[] bytes) {
    try {
        return (new String(bytes, "UTF-8"));

    } catch (UnsupportedEncodingException e) {
        e.printStackTrace();
    }
    Charset UTF8_CHARSET = Charset.forName("UTF-8");
    return new String(bytes, UTF8_CHARSET);
}

Also I used UTF-16 and got a bit better results, however it was giving me strange chars yet and according to doc provided above I should use utf8. 我也使用了UTF-16并获得了更好的结果，但是它给了我奇怪的字符，根据上面提供的文档，我应该使用utf8。

Thanks in advance for helping. 在此先感谢您的帮助。

EDIT: 编辑：

Base64 value: S0QtOTI2IEdHMDA2AAAAAA==\\n Base64值：S0QtOTI2IEdHMDA2AAAAAA == \\ n

Answer 1

Bluetooth name display issue: 蓝牙名称显示问题：

If you check Bluetooth adapter setName(), you will get that 如果你检查蓝牙适配器setName（），你会得到它

https://developer.android.com/reference/android/bluetooth/BluetoothAdapter.html#setName https://developer.android.com/reference/android/bluetooth/BluetoothAdapter.html#setName

Valid Bluetooth names are a maximum of 248 bytes using UTF-8 encoding, although many remote devices can only display the first 40 characters, and some may be limited to just 20. 使用UTF-8编码的有效蓝牙名称最多为248个字节，尽管许多远程设备只能显示前40个字符，有些可能仅限于20个字符。

Android Supported Versions: Android支持的版本：

If you check the link https://stackoverflow.com/a/7989085/2293534 , you will get the list of android supported version. 如果您查看链接https://stackoverflow.com/a/7989085/2293534 ，您将获得Android支持的版本列表。

Supported and Non supported locales are given in the table: 表中给出了受支持和不受支持的语言环境：

-----------------------------------------------------------------------------------------------------
             | DEC Korean | Korean EUC | ISO-2022-KR | KSC5601/cp949 | UCS-2/UTF-16 | UCS-4 | UTF-8 |
-----------------------------------------------------------------------------------------------------
 DEC Korean  |      -     |      Y     |     N       |      Y        |        Y     |   Y   |   Y   |
-----------------------------------------------------------------------------------------------------
 Korean EUC  |      Y     |      -     |     Y       |      N        |        N     |   N   |   N   |
-----------------------------------------------------------------------------------------------------
 ISO-2022-KR |      N     |      Y     |     -       |      Y        |        N     |   N   |   N   |
-----------------------------------------------------------------------------------------------------
KSC5601/cp949|      Y     |      N     |     Y       |      -        |        Y     |   Y   |   Y   |
-----------------------------------------------------------------------------------------------------
 UCS-2/UTF-16|      Y     |      N     |     N       |      Y        |        -     |   Y   |   Y   |
-----------------------------------------------------------------------------------------------------
    UCS-4    |      Y     |      N     |     N       |      Y        |        Y     |   -   |   Y   |
-----------------------------------------------------------------------------------------------------
    UTF-8    |      Y     |      N     |     N       |      Y        |        Y     |   Y   |   -   |
-----------------------------------------------------------------------------------------------------

For solution, 对于解决方案

Solution#1: 解决方案1：

Michael has given a great example for conversion. 迈克尔为转换提供了一个很好的例子。 For more you can check https://stackoverflow.com/a/40070761/2293534 有关更多信息，请查看https://stackoverflow.com/a/40070761/2293534

When you call getBytes(), you are getting the raw bytes of the string encoded under your system's native character encoding (which may or may not be UTF-8). 当您调用getBytes（）时，您将获得在系统的本机字符编码（可能是也可能不是UTF-8）下编码的字符串的原始字节。 Then, you are treating those bytes as if they were encoded in UTF-8, which they might not be. 然后，您将这些字节视为UTF-8编码，它们可能不是。

A more reliable approach would be to read the ko_KR-euc file into a Java String. 更可靠的方法是将ko_KR-euc文件读入Java String。 Then, write out the Java String using UTF-8 encoding. 然后，使用UTF-8编码写出Java String。
 InputStream in = ... Reader reader = new InputStreamReader(in, "ko_KR-euc"); // you can use specific korean locale here StringBuilder sb = new StringBuilder(); int read; while ((read = reader.read()) != -1){ sb.append((char)read); } reader.close(); String string = sb.toString(); OutputStream out = ... Writer writer = new OutputStreamWriter(out, "UTF-8"); writer.write(string); writer.close(); 
NB: You should, of course, use the correct encoding name 注意：您当然应该使用正确的编码名称

Solution#2: 解决方案2：

Using StringUtils, you can do it https://stackoverflow.com/a/30170431/2293534 使用StringUtils，你可以做到这一点https://stackoverflow.com/a/30170431/2293534

Solutions#3: 解决方案＃3：

You can use Apache Commons IO for conversion. 您可以使用Apache Commons IO进行转换。 A very great example is given here: http://www.utdallas.edu/~lmorenoc/research/icse2015/commons-io-2.4/examples/toString_49.html 这里给出了一个非常好的例子： http ： //www.utdallas.edu/~lmorenoc/research/icse2015/commons-io-2.4/examples/toString_49.html

1 String resource;
2 //getClass().getResourceAsStream(resource) -> the <code>InputStream</code> to read from
3 //"UTF-8" -> the encoding to use, null means platform default
4 IOUtils.toString(getClass().getResourceAsStream(resource),"UTF-8");

Resource Links: 资源链接：

Answer 2

I suggest you use StringUtils per Apache libraries. 我建议你为每个Apache库使用StringUtils。 I believe the necessary methods for your are documented here: 我相信你的必要方法在这里记录：

https://commons.apache.org/proper/commons-codec/apidocs/org/apache/commons/codec/binary/StringUtils.html https://commons.apache.org/proper/commons-codec/apidocs/org/apache/commons/codec/binary/StringUtils.html

在Java中编码可变长度的utf8字节数组

问题描述

2 个解决方案

解决方案1
5 已采纳 2016-11-07 06:30:23

Bluetooth name display issue: 蓝牙名称显示问题：

Android Supported Versions: Android支持的版本：

Supported and Non supported locales are given in the table: 表中给出了受支持和不受支持的语言环境：

For solution, 对于解决方案

Resource Links: 资源链接：

解决方案2
2 2016-11-07 01:31:37

在Java中编码可变长度的utf8字节数组

问题描述

2 个解决方案

解决方案1 5 已采纳 2016-11-07 06:30:23

Bluetooth name display issue: 蓝牙名称显示问题：

Android Supported Versions: Android支持的版本：

Supported and Non supported locales are given in the table: 表中给出了受支持和不受支持的语言环境：

For solution, 对于解决方案

Resource Links: 资源链接：

解决方案2 2 2016-11-07 01:31:37

解决方案1
5 已采纳 2016-11-07 06:30:23

解决方案2
2 2016-11-07 01:31:37