[英]byte [] to String conversion and again back to byte [] using UTF-8 encoding is not giving same byte array
To understand more on bytes, char and String in Java, I took a sample byte [] and converted to String and then from string converted to byte [] back. 为了了解Java中有关字节,char和String的更多信息,我提取了一个样本byte []并转换为String,然后从字符串转换为byte []。 However I realized that original byte [] and new byte [] are not same.
但是我意识到原始字节[]和新字节[]不相同。 Why?
为什么? Any help.
任何帮助。
import java.io.UnsupportedEncodingException;
public class HelloWorld{
public static void main(String []args) throws UnsupportedEncodingException{
byte [] originalStringBytes = {39, -94, 17, -18, 43, 32, 50, -70, 31, -125, -46, 10, -23, 32, -112, 63};
//Convert into string
String convertedString = new String (originalStringBytes, "UTF-8");
//Now again get the bytes back from string
byte [] afterStringConversionBytes = convertedString.getBytes("UTF-8");
//compare two byte array, both are not same
if(originalStringBytes.length == afterStringConversionBytes.length) {
System.out.println("SAME");
} else {
System.out.println("DIFFERENT");
}
}
}
It printed "DIFFERENT" for me. 它为我打印了“不同”。
A sequence of bytes has to follow strict rules to be valid utf-8 encoded text. 字节序列必须遵循严格的规则才能成为有效的utf-8编码文本。 What you have in the array does not follow these rules, and can't be converted into a string without losing information.
数组中的内容不遵循这些规则,并且在不丢失信息的情况下无法转换为字符串。
The rules are explained for example in https://en.wikipedia.org/wiki/UTF-8 例如在https://en.wikipedia.org/wiki/UTF-8中解释了规则
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.