[英]UTF-16 Encoding appeding extra bytes
import java.io.UnsupportedEncodingException;
import java.util.Arrays;
public class UnicodeConversion
{
public static void main(String[] args) throws UnsupportedEncodingException
{
String jaString = new String("\u20AC");
System.out.println(jaString);
System.out.println(Arrays.toString(jaString.getBytes("UTF-16")));
}
}
In the UTF-16 encoding mechanism explained in https://en.wikipedia.org/wiki/UTF-16 \€ (Euro Symbol) should take only single 16 bit char value, ie 2 bytes.在https://en.wikipedia.org/wiki/UTF-16 \€ (Euro Symbol) 中解释的 UTF-16 编码机制中,应该只采用单个 16 位字符值,即 2 个字节。 But I am seeing the output as 4 bytes
但我看到输出为 4 个字节
The output I receive is below我收到的输出如下
[-2, -1, 32, -84] [-2, -1, 32, -84]
Could you explain me how the values -2 and -1 have come?你能解释一下值 -2 和 -1 是怎么来的吗? I have run this with Jdk 11
我已经用 Jdk 11 运行了这个
The first two bytes is a Byte-Order Mark that determines the endianness of the 16-bit values.前两个字节是字节顺序标记,用于确定 16 位值的字节序。
0xFE, 0xFF
= -2, -1
meaning you've got Big-Endian byte order. 0xFE, 0xFF
= -2, -1
意味着你有大端字节序。
Use "UTF-16LE"
or "UTF-16BE"
instead to omit the BOM.使用
"UTF-16LE"
或"UTF-16BE"
来省略 BOM。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.