简体   繁体   English

Java:utf8中不同的byte []具有相同的字符串

[英]Java: Different byte[] has same string in utf8

There are two different byte array.When i get String from byte[].They have same value when i use utf8. 有两个不同的字节数组。当我从byte []获取String时。当我使用utf8时,它们具有相同的值。 Opposite when i use ISO-8859-1. 当我使用ISO-8859-1时相反。

    byte[] valueFir = new byte[]{0, 1, -79};
    byte[] valueSec = new byte[]{0, 1, -80};

    Charset CHARSET = Charset.forName("ISO-8859-1");
    Charset UTF8SET = Charset.forName("UTF-8");
    Charset[] list = new Charset[]{CHARSET, UTF8SET};

    for(int i=0; i<list.length; i++){

        String fir = new String(valueFir,list[i]);
        String sec = new String(valueSec,list[i]);

        Assert.assertNotEquals(fir,sec);
    }

First assert is true,Second assert is fail. 第一个断言为真,第二个断言为失败。 what's the reason? 什么原因?

If you look at the Javadoc for the String constructor that you're using , it says 如果您查看正在使用的String构造函数Javadoc ,它将显示

This method always replaces malformed-input and unmappable-character sequences with this charset's default replacement string. 此方法始终使用此字符集的默认替换字符串替换格式错误的输入和不可映射的字符序列。

Now in UTF8, the bytes -79 and -80 don't map to individual characters. 现在在UTF8中,字节-79和-80不再映射到单个字符。 So both your byte arrays make no sense in UTF8. 因此,两个字节数组在UTF8中都没有意义。 And because they're unmappable, you're just getting the default String twice. 而且因为它们不可映射,所以您只获得了默认的String两次。 Your assertNotEquals is then comparing the default String to itself. 然后,您的assertNotEquals将默认String assertNotEquals自身进行比较。

However, your byte arrays make perfect sense in ISO-8859-1, and get converted to two different String values. 但是,您的字节数组在ISO-8859-1中非常有意义,并且可以转换为两个不同的String值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM