[英]Java: Different byte[] has same string in utf8
There are two different byte array.When i get String from byte[].They have same value when i use utf8. 有两个不同的字节数组。当我从byte []获取String时。当我使用utf8时,它们具有相同的值。 Opposite when i use ISO-8859-1.
当我使用ISO-8859-1时相反。
byte[] valueFir = new byte[]{0, 1, -79};
byte[] valueSec = new byte[]{0, 1, -80};
Charset CHARSET = Charset.forName("ISO-8859-1");
Charset UTF8SET = Charset.forName("UTF-8");
Charset[] list = new Charset[]{CHARSET, UTF8SET};
for(int i=0; i<list.length; i++){
String fir = new String(valueFir,list[i]);
String sec = new String(valueSec,list[i]);
Assert.assertNotEquals(fir,sec);
}
First assert is true,Second assert is fail. 第一个断言为真,第二个断言为失败。 what's the reason?
什么原因?
If you look at the Javadoc for the String
constructor that you're using , it says 如果您查看正在使用的
String
构造函数的Javadoc ,它将显示
This method always replaces malformed-input and unmappable-character sequences with this charset's default replacement string.
此方法始终使用此字符集的默认替换字符串替换格式错误的输入和不可映射的字符序列。
Now in UTF8, the bytes -79 and -80 don't map to individual characters. 现在在UTF8中,字节-79和-80不再映射到单个字符。 So both your byte arrays make no sense in UTF8.
因此,两个字节数组在UTF8中都没有意义。 And because they're unmappable, you're just getting the default
String
twice. 而且因为它们不可映射,所以您只获得了默认的
String
两次。 Your assertNotEquals
is then comparing the default String
to itself. 然后,您的
assertNotEquals
将默认String
assertNotEquals
自身进行比较。
However, your byte arrays make perfect sense in ISO-8859-1, and get converted to two different String
values. 但是,您的字节数组在ISO-8859-1中非常有意义,并且可以转换为两个不同的
String
值。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.