简体   繁体   English

字节[]到字符串的转换,并再次使用UTF-8编码返回到字节[],没有给出相同的字节数组

[英]byte [] to String conversion and again back to byte [] using UTF-8 encoding is not giving same byte array

To understand more on bytes, char and String in Java, I took a sample byte [] and converted to String and then from string converted to byte [] back. 为了了解Java中有关字节,char和String的更多信息,我提取了一个样本byte []并转换为String,然后从字符串转换为byte []。 However I realized that original byte [] and new byte [] are not same. 但是我意识到原始字节[]和新字节[]不相同。 Why? 为什么? Any help. 任何帮助。

import java.io.UnsupportedEncodingException;

public class HelloWorld{

     public static void main(String []args) throws UnsupportedEncodingException{

        byte [] originalStringBytes = {39, -94, 17, -18, 43, 32, 50, -70, 31, -125, -46, 10, -23, 32, -112, 63};
        //Convert into string 
        String convertedString = new String (originalStringBytes, "UTF-8");
        //Now again get the bytes back from string 
        byte [] afterStringConversionBytes = convertedString.getBytes("UTF-8");
        //compare two byte array, both are not same
        if(originalStringBytes.length == afterStringConversionBytes.length) {
            System.out.println("SAME");
        } else {
            System.out.println("DIFFERENT");
        }

     }
}

It printed "DIFFERENT" for me. 它为我打印了“不同”。

A sequence of bytes has to follow strict rules to be valid utf-8 encoded text. 字节序列必须遵循严格的规则才能成为有效的utf-8编码文本。 What you have in the array does not follow these rules, and can't be converted into a string without losing information. 数组中的内容不遵循这些规则,并且在不丢失信息的情况下无法转换为字符串。

The rules are explained for example in https://en.wikipedia.org/wiki/UTF-8 例如在https://en.wikipedia.org/wiki/UTF-8中解释了规则

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM