简体   繁体   English

将字节数组解码为字符串而不会丢失数据

[英]encoding decoding of byte array to string without data loss

I tried to convert byte[] to string as follows: 我试图将byte []转换为字符串,如下所示:

Map<String, String> biomap = new HashMap<String, String>();
biomap.put("L1", new String(Lf1, "ISO-8859-1"));

where Lf1 is byte[] array and then i convert this string to byte[]: problem is, when i convert byte array to string it comes like: 其中Lf1是byte []数组然后我将此字符串转换为byte []:问题是,当我将字节数组转换为字符串时,它就像:

FMR  F P�d@� �0d@r (@� ......... etc

and

String SF1 = biomap.get("L1");
byte[] storedL1 = SF1.getBytes("ISO-8859-1")

and when i convert back it to byte array and compare both arrays, it return false. 当我将它转换回字节数组并比较两个数组时,它返回false。 I mean Data Changed. 我的意思是Data Changed。

i want same byte[] data as it was when i encoded to string and decodec to byte[] 我想要与我编码为字符串和解码到byte []时相同的byte []数据

First: ISO-8859-1 does not cause any data loss if an arbitrary byte array is converted to string using this encoding. 第一: ISO-8859-1 不会造成任何数据丢失,如果一个任意字节阵列使用这种编码转换成字符串。 Consider the following program: 考虑以下程序:

public class BytesToString {
    public static void main(String[] args) throws Exception {
        // array that will contain all the possible byte values
        byte[] bytes = new byte[256];
        for (int i = 0; i < 256; i++) {
            bytes[i] = (byte) (i + Byte.MIN_VALUE);
        }

        // converting to string and back to bytes
        String str = new String(bytes, "ISO-8859-1");
        byte[] newBytes = str.getBytes("ISO-8859-1");

        if (newBytes.length != 256) {
            throw new IllegalStateException("Wrong length");
        }
        boolean mismatchFound = false;
        for (int i = 0; i < 256; i++) {
            if (newBytes[i] != bytes[i]) {
                System.out.println("Mismatch: " + bytes[i] + "->" + newBytes[i]);
                mismatchFound = true;
            }
        }
        System.out.println("Whether a mismatch was found: " + mismatchFound);
    }
}

It builds an array of bytes with all possible byte values, then it converts it to String using ISO-8859-1 and then back to bytes using the same encoding. 它使用所有可能的字节值构建一个字节数组,然后使用ISO-8859-1将其转换为String ,然后使用相同的编码将其转换为字节。

This program outputs Whether a mismatch was found: false , so bytes->String->bytes conversion via ISO-8859-1 yields the same data as it was in the beginning. 该程序输出Whether a mismatch was found: false ,因此通过ISO-8859-1字节ISO-8859-1 >字符串 - >字节转换产生与开始时相同的数据。

But, as it was pointed out in the comments, String is not a good container for binary data. 但是,正如评论中指出的那样, String不是二进制数据的好容器。 Specifically, such a string will almost surely contain unprintable characters, so if you print it or try to pass it via HTML or some other means, you will get some problems (data loss, for example). 具体来说,这样的字符串几乎肯定会包含不可打印的字符,因此如果您打印它或尝试通过HTML或其他方式传递它,您将遇到一些问题(例如数据丢失)。

If you really need to convert byte array to a string (and use it opaquely), use base64 encoding: 如果您确实需要将字节数组转换为字符串(并使用opaquely),请使用base64编码:

String stringRepresentation = Base64.getEncoder().encodeToString(bytes);
byte[] decodedBytes = Base64.getDecoder().decode(stringRepresentation);

It takes more space, but the resulting string is safe in regard to printing. 它需要更多空间,但结果字符串在打印方面是安全的。

There are special encodings like base64 for encoding binary data for text only systems. 有一些特殊的编码,比如base64,用于为纯文本系统编码二进制数据。

Converting a byte[] to String is only guaranteed to work, if the byte[] contains a valid sequence of bytes according to the chosen encoding. 如果byte[]根据所选编码包含有效的字节序列,则只保证将byte[]转换为String Unknown byte sequences might be replaced with the unicode replacement character (as shown in your example). 未知的字节序列可能会被unicode替换字符替换(如示例所示)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM