UTF-16 编码附加额外字节

Question

import java.io.UnsupportedEncodingException;
import java.util.Arrays;

public class UnicodeConversion
{
    public static void main(String[] args) throws UnsupportedEncodingException
    {
        String jaString = new String("\u20AC");
        System.out.println(jaString);
        System.out.println(Arrays.toString(jaString.getBytes("UTF-16")));
    }
}

In the UTF-16 encoding mechanism explained in https://en.wikipedia.org/wiki/UTF-16 \€ (Euro Symbol) should take only single 16 bit char value, ie 2 bytes.在https://en.wikipedia.org/wiki/UTF-16 \€ (Euro Symbol) 中解释的 UTF-16 编码机制中，应该只采用单个 16 位字符值，即 2 个字节。 But I am seeing the output as 4 bytes但我看到输出为 4 个字节

The output I receive is below我收到的输出如下

[-2, -1, 32, -84] [-2, -1, 32, -84]

Could you explain me how the values -2 and -1 have come?你能解释一下值 -2 和 -1 是怎么来的吗？ I have run this with Jdk 11我已经用 Jdk 11 运行了这个

Answer 1

The first two bytes is a Byte-Order Mark that determines the endianness of the 16-bit values.前两个字节是字节顺序标记，用于确定 16 位值的字节序。

0xFE, 0xFF = -2, -1 meaning you've got Big-Endian byte order. 0xFE, 0xFF = -2, -1意味着你有大端字节序。

Use "UTF-16LE" or "UTF-16BE" instead to omit the BOM.使用"UTF-16LE"或"UTF-16BE"来省略 BOM。

UTF-16 编码附加额外字节

问题描述

1 个解决方案

解决方案1
2 2019-11-13 14:59:13

UTF-16 编码附加额外字节

问题描述

1 个解决方案

解决方案1 2 2019-11-13 14:59:13

解决方案1
2 2019-11-13 14:59:13