Java将字符串UTF-8转换为UTF-16

Question

I try to convert String a = "try" to String UTF-16 I did this : 我尝试将String a =“ try”转换为String UTF-16，我这样做是：

 try {
            String ulany = new String("357810087745445");
            System.out.println(ulany.getBytes().length);
            String string = new String(ulany.getBytes(), "UTF-16");
            System.out.println(string.getBytes().length);
        } catch (UnsupportedEncodingException e) {
            e.printStackTrace();
        }

And ulany.getBytes().length = 15 and System.out.println(string.getBytes().length) = 24 but I think that it should be 30 what I did wrong ? 和ulany.getBytes（）。length = 15和System.out.println（string.getBytes（）。length）= 24，但是我认为应该是30，我做错了什么？

Answer 1

String (and char ) hold Unicode. 字符串 （和char ）保存Unicode。 So nothing is needed. 因此，无需任何操作。

However if you want bytes , binary data, that are in some encoding, like UTF-16, you need a conversion: 但是，如果要使用某种编码（例如UTF-16）的bytes ，二进制数据，则需要进行转换：

ulany.getBytes("UTF-16") // Those bytes are in UTF-16 big endian
ulany.getBytes("UTF-16LE")

However System.out uses the operating systems encoding, so one cannot just pick some different encoding. 但是System.out使用操作系统编码，因此不能仅仅选择一些不同的编码。

In fact char is UTF-16 encoded. 实际上char是UTF-16编码的。

What happens 怎么了

        //String ulany = new String("357810087745445");
        String ulany = "357810087745445";

The String copy constructor stems from the C++ beginning, and is senseless. String复制构造函数起源于C ++，并且毫无意义。

        System.out.println(ulany.getBytes().length);

This will run on different platforms differently, as getBytes() uses the default Charset. 由于getBytes()使用默认的Charset，它将在不同的平台上运行的方式有所不同。 Better 更好

        System.out.println(ulany.getBytes("UTF-8").length);

        String string = new String(ulany.getBytes(), "UTF-16");

This interpretes those bytes pairwise; 这将成对解释这些字节。 having 15 bytes is already wrong. 具有15个字节已经是错误的。 Evidently one gets 7 (8?) special characters, as the high byte is not zero. 显然，由于高字节不为零，因此一个字符获得7（8？）个特殊字符。

        System.out.println(string.getBytes().length);

Now getting 24 means an average 3 bytes per char. 现在获得24表示每个字符平均3个字节。 Hence the default platform encoding is probably UTF-8 creating multibyte sequences. 因此，默认的平台编码可能是创建多字节序列的UTF-8。

The string will contain something like: 该字符串将包含以下内容：

        String string = "\u3533\u3837\u3031\u3830\u3737\u3534\u3434?";

Answer 2

You can also include a text encoding in getBytes(). 您还可以在getBytes（）中包含文本编码。 For example: 例如：

String string = new String(ulany.getBytes("UTF-8"), "UTF-16");

Java将字符串UTF-8转换为UTF-16

问题描述

2 个解决方案

解决方案1
3 2017-09-14 10:11:52

解决方案2
1 2017-09-14 10:11:02

Java将字符串UTF-8转换为UTF-16

问题描述

2 个解决方案

解决方案1 3 2017-09-14 10:11:52

解决方案2 1 2017-09-14 10:11:02

解决方案1
3 2017-09-14 10:11:52

解决方案2
1 2017-09-14 10:11:02