[英]Java convert String UTF-8 to UTF-16
I try to convert String a = "try" to String UTF-16 I did this : 我尝试将String a =“ try”转换为String UTF-16,我这样做是:
try {
String ulany = new String("357810087745445");
System.out.println(ulany.getBytes().length);
String string = new String(ulany.getBytes(), "UTF-16");
System.out.println(string.getBytes().length);
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
And ulany.getBytes().length = 15 and System.out.println(string.getBytes().length) = 24 but I think that it should be 30 what I did wrong ? 和ulany.getBytes()。length = 15和System.out.println(string.getBytes()。length)= 24,但是我认为应该是30,我做错了什么?
String (and char ) hold Unicode. 字符串 (和char )保存Unicode。 So nothing is needed.
因此,无需任何操作。
However if you want bytes , binary data, that are in some encoding, like UTF-16, you need a conversion: 但是,如果要使用某种编码(例如UTF-16)的bytes ,二进制数据,则需要进行转换:
ulany.getBytes("UTF-16") // Those bytes are in UTF-16 big endian
ulany.getBytes("UTF-16LE")
However System.out uses the operating systems encoding, so one cannot just pick some different encoding. 但是System.out使用操作系统编码,因此不能仅仅选择一些不同的编码。
In fact char
is UTF-16 encoded. 实际上
char
是UTF-16编码的。
What happens 怎么了
//String ulany = new String("357810087745445");
String ulany = "357810087745445";
The String copy constructor stems from the C++ beginning, and is senseless. String复制构造函数起源于C ++,并且毫无意义。
System.out.println(ulany.getBytes().length);
This will run on different platforms differently, as getBytes()
uses the default Charset. 由于
getBytes()
使用默认的Charset,它将在不同的平台上运行的方式有所不同。 Better 更好
System.out.println(ulany.getBytes("UTF-8").length);
String string = new String(ulany.getBytes(), "UTF-16");
This interpretes those bytes pairwise; 这将成对解释这些字节。 having 15 bytes is already wrong.
具有15个字节已经是错误的。 Evidently one gets 7 (8?) special characters, as the high byte is not zero.
显然,由于高字节不为零,因此一个字符获得7(8?)个特殊字符。
System.out.println(string.getBytes().length);
Now getting 24 means an average 3 bytes per char. 现在获得24表示每个字符平均3个字节。 Hence the default platform encoding is probably UTF-8 creating multibyte sequences.
因此,默认的平台编码可能是创建多字节序列的UTF-8。
The string will contain something like: 该字符串将包含以下内容:
String string = "\u3533\u3837\u3031\u3830\u3737\u3534\u3434?";
You can also include a text encoding in getBytes(). 您还可以在getBytes()中包含文本编码。 For example:
例如:
String string = new String(ulany.getBytes("UTF-8"), "UTF-16");
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.