简体   繁体   English

UTF-16 编码附加额外字节

[英]UTF-16 Encoding appeding extra bytes

import java.io.UnsupportedEncodingException;
import java.util.Arrays;

public class UnicodeConversion
{
    public static void main(String[] args) throws UnsupportedEncodingException
    {
        String jaString = new String("\u20AC");
        System.out.println(jaString);
        System.out.println(Arrays.toString(jaString.getBytes("UTF-16")));
    }
}

In the UTF-16 encoding mechanism explained in https://en.wikipedia.org/wiki/UTF-16 \€ (Euro Symbol) should take only single 16 bit char value, ie 2 bytes.https://en.wikipedia.org/wiki/UTF-16 \€ (Euro Symbol) 中解释的 UTF-16 编码机制中,应该只采用单个 16 位字符值,即 2 个字节。 But I am seeing the output as 4 bytes但我看到输出为 4 个字节

The output I receive is below我收到的输出如下

[-2, -1, 32, -84] [-2, -1, 32, -84]

Could you explain me how the values -2 and -1 have come?你能解释一下值 -2 和 -1 是怎么来的吗? I have run this with Jdk 11我已经用 Jdk 11 运行了这个

The first two bytes is a Byte-Order Mark that determines the endianness of the 16-bit values.前两个字节是字节顺序标记,用于确定 16 位值的字节序。

0xFE, 0xFF = -2, -1 meaning you've got Big-Endian byte order. 0xFE, 0xFF = -2, -1意味着你有大端字节序。

Use "UTF-16LE" or "UTF-16BE" instead to omit the BOM.使用"UTF-16LE""UTF-16BE"来省略 BOM。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM