简体   繁体   中英

Java: String to byte array conversion

I am getting some unexpected results from what I thought was a simple test. After running the following:

byte [] bytes = {(byte)0x40, (byte)0xE2, (byte)0x56, (byte)0xFF, (byte)0xAD, (byte)0xDC};
String s = new String(bytes, Charset.forName("UTF-8"));
byte[] bytes2 = s.getBytes(Charset.forName("UTF-8"));

bytes2 is a 14 element long array nothing like the original (bytes). Is there a way to do this sort of conversion and retain the original decomposition to bytes?

Is there a way to do this sort of conversion and retain the original decomposition to bytes?

Well that doesn't look like valid UTF-8 to me, so I'm not surprised it didn't round-trip.

If you want to convert arbitrary binary data to text in a reversible way, use base64, eg via this public domain encoder/decoder .

This should do:

public class Main
{

    /*
     * This method converts a String to an array of bytes
     */
    public void convertStringToByteArray()
    {

        String stringToConvert = "This String is 76 characters long and will be converted to an array of bytes";

        byte[] theByteArray = stringToConvert.getBytes();

        System.out.println(theByteArray.length);

    }

    /**
     * @param args the command line arguments
     */
    public static void main(String[] args)
    {    
        new Main().convertStringToByteArray();
    }
}

Two things:

  1. The byte sequence does not appear to be valid UTF-8

      $ python >>> '\\x40\\xe2\\x56\\xff\\xad\\xdc'.decode('utf8') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib64/python2.7/encodings/utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeDecodeError: 'utf8' codec can't decode byte 0xe2 in position 1: invalid continuation byte 
  2. Even if it were valid UTF-8, decoding and then encoding can result in different bytes due to things like precombined characters and other Unicode features.

If you want to encode arbitrary binary data in a string in a way where you are guaranteed to get the same bytes back when you decode them, your best bet is something like base64.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM