简体   繁体   English

将char数组转换为字节数组并再次返回

[英]Converting char array into byte array and back again

I'm looking to convert a Java char array to a byte array without creating an intermediate String , as the char array contains a password. 我正在寻找将Java char数组转换为字节数组而不创建中间String ,因为char数组包含密码。 I've looked up a couple of methods, but they all seem to fail: 我查了几种方法,但它们似乎都失败了:

char[] password = "password".toCharArray();

byte[] passwordBytes1 = new byte[password.length*2];
ByteBuffer.wrap(passwordBytes1).asCharBuffer().put(password);

byte[] passwordBytes2 = new byte[password.length*2];
for(int i=0; i<password.length; i++) {
    passwordBytes2[2*i] = (byte) ((password[i]&0xFF00)>>8); 
    passwordBytes2[2*i+1] = (byte) (password[i]&0x00FF); 
}

String passwordAsString = new String(password);
String passwordBytes1AsString = new String(passwordBytes1);
String passwordBytes2AsString = new String(passwordBytes2);

System.out.println(passwordAsString);
System.out.println(passwordBytes1AsString);
System.out.println(passwordBytes2AsString);
assertTrue(passwordAsString.equals(passwordBytes1) || passwordAsString.equals(passwordBytes2));

The assertion always fails (and, critically, when the code is used in production, the password is rejected), yet the print statements print out password three times. 断言总是失败(并且,关键是,当在生产中使用代码时,密码被拒绝),但是打印语句打印出密码三次。 Why are passwordBytes1AsString and passwordBytes2AsString different from passwordAsString , yet appear identical? 为什么passwordBytes1AsStringpasswordBytes2AsString不同passwordAsString ,又出现相同? Am I missing out a null terminator or something? 我错过了一个空终结符或什么? What can I do to make the conversion and unconversion work? 我该怎么做才能使转换和非转换工作?

Conversion between char and byte is character set encoding and decoding.I prefer to make it as clear as possible in code. char和byte之间的转换是字符集编码和解码。我更喜欢在代码中尽可能清楚。 It doesn't really mean extra code volume: 它并不意味着额外的代码量:

 Charset latin1Charset = Charset.forName("ISO-8859-1"); 
 charBuffer = latin1Charset.decode(ByteBuffer.wrap(byteArray)); // also decode to String
 byteBuffer = latin1Charset.encode(charBuffer);                 // also decode from String

Aside: 在旁边:

java.nio classes and java.io Reader/Writer classes use ByteBuffer & CharBuffer (which use byte[] and char[] as backing arrays). java.nio类和java.io Reader / Writer类使用ByteBuffer和CharBuffer(使用byte []和char []作为后备数组)。 So often preferable if you use these classes directly. 因此,如果直接使用这些类,通常更可取。 However, you can always do: 但是,您可以随时执行:

 byteArray = ByteBuffer.array();  byteBuffer = ByteBuffer.wrap(byteArray);  
 byteBuffer.get(byteArray);       charBuffer.put(charArray);
 charArray = CharBuffer.array();  charBuffer = ByteBuffer.wrap(charArray);
 charBuffer.get(charArray);       charBuffer.put(charArray);

The problem is your use of the String(byte[]) constructor, which uses the platform default encoding. 问题是你使用String(byte[])构造函数,它使用平台默认编码。 That's almost never what you should be doing - if you pass in "UTF-16" as the character encoding to work, your tests will probably pass. 这几乎不是你应该做的 - 如果你传入“UTF-16”作为字符编码工作,你的测试可能会通过。 Currently I suspect that passwordBytes1AsString and passwordBytes2AsString are each 16 characters long, with every other character being U+0000. 目前我怀疑passwordBytes1AsStringpasswordBytes2AsString每个长16个字符,其他每个字符都是U + 0000。

Original Answer 原始答案

    public byte[] charsToBytes(char[] chars){
        Charset charset = Charset.forName("UTF-8");
        ByteBuffer byteBuffer = charset.encode(CharBuffer.wrap(chars));
        return Arrays.copyOf(byteBuffer.array(), byteBuffer.limit());
    }

    public char[] bytesToChars(byte[] bytes){
        Charset charset = Charset.forName("UTF-8");
        CharBuffer charBuffer = charset.decode(ByteBuffer.wrap(bytes));
        return Arrays.copyOf(charBuffer.array(), charBuffer.limit());    
    }

Edited to use StandardCharsets 编辑使用StandardCharsets

public byte[] charsToBytes(char[] chars)
{
    final ByteBuffer byteBuffer = StandardCharsets.UTF_8.encode(CharBuffer.wrap(chars));
    return Arrays.copyOf(byteBuffer.array(), byteBuffer.limit());
}

public char[] bytesToChars(byte[] bytes)
{
    final CharBuffer charBuffer = StandardCharsets.UTF_8.decode(ByteBuffer.wrap(bytes));
    return Arrays.copyOf(charBuffer.array(), charBuffer.limit());    
}

Here is a JavaDoc page for StandardCharsets . 这是StandardCharsetsJavaDoc页面 Note this on the JavaDoc page: 请在JavaDoc页面上注意这一点:

These charsets are guaranteed to be available on every implementation of the Java platform. 这些字符集保证在Java平台的每个实现中都可用。

If you want to use a ByteBuffer and CharBuffer, don't do the simple .asCharBuffer() , which simply does an UTF-16 (LE or BE, depending on your system - you can set the byte-order with the order method) conversion (since the Java Strings and thus your char[] internally uses this encoding). 如果你想使用ByteBuffer和CharBuffer,不要做简单的.asCharBuffer() ,它只是一个UTF-16(LE或BE,取决于你的系统 - 你可以用order方法设置字节order )转换(因为Java字符串,因此你的char[]内部使用这种编码)。

Use Charset.forName(charsetName) , and then its encode or decode method, or the newEncoder / newDecoder . 使用Charset.forName(charsetName) ,然后使用其encodedecode方法,或newEncoder / newDecoder

When converting your byte[] to String, you also should indicate the encoding (and it should be the same one). 将byte []转换为String时,还应指明编码(它应该是相同的编码)。

I would do is use a loop to convert to bytes and another to conver back to char. 我会做的是使用循环转换为字节,另一个转换回char。

char[] chars = "password".toCharArray();
byte[] bytes = new byte[chars.length*2];
for(int i=0;i<chars.length;i++) {
   bytes[i*2] = (byte) (chars[i] >> 8);
   bytes[i*2+1] = (byte) chars[i];
}
char[] chars2 = new char[bytes.length/2];
for(int i=0;i<chars2.length;i++) 
   chars2[i] = (char) ((bytes[i*2] << 8) + (bytes[i*2+1] & 0xFF));
String password = new String(chars2);

You should make use of getBytes() instead of toCharArray() 你应该使用getBytes()而不是toCharArray()

Replace the line 更换线

char[] password = "password".toCharArray();

with

byte[] password = "password".getBytes();

This is an extension to Peter Lawrey's answer. 这是Peter Lawrey的答案的延伸。 In order to backward (bytes-to-chars) conversion work correctly for the whole range of chars, the code should be as follows: 为了向后(字节到字符)转换正确地为整个字符范围工作,代码应如下所示:

char[] chars = new char[bytes.length/2];
for (int i = 0; i < chars.length; i++) {
   chars[i] = (char) (((bytes[i*2] & 0xff) << 8) + (bytes[i*2+1] & 0xff));
}

We need to "unsign" bytes before using ( & 0xff ). 我们需要在使用( & 0xff )之前“取消签名”字节。 Otherwise half of the all possible char values will not get back correctly. 否则,所有可能的char值中的一半将无法正确返回。 For instance, chars within [0x80..0xff] range will be affected. 例如, [0x80..0xff]范围内的字符将受到影响。

When you use GetBytes From a String in Java, The return result will depend on the default encode of your computer setting.(eg: StandardCharsetsUTF-8 or StandardCharsets.ISO_8859_1etc...). 当您在Java中使用字符串中的GetBytes时,返回结果将取决于您的计算机设置的默认编码。(例如:StandardCharsetsUTF-8或StandardCharsets.ISO_8859_1etc ...)。

So, whenever you want to getBytes from a String Object. 因此,无论何时您想从String对象中获取。 Make sure to give a encode . 确保提供编码。 like : 喜欢 :

String sample = "abc";
Byte[] a_byte = sample .getBytes(StandardCharsets.UTF_8);

Let check what has happened with the code. 让我们检查代码发生了什么。 In java, the String named sample , is stored by Unicode. 在java中,名为sample的String由Unicode存储。 every char in String stored by 2 byte. String中的每个char都以2个字节存储。

sample :  value: "abc"   in Memory(Hex):  00 61 00 62 00 63
        a -> 00 61
        b -> 00 62
        c -> 00 63

But, When we getBytes From a String, we have 但是,当我们从字符串中获取字符时,我们就有了

Byte[] a_byte = sample .getBytes(StandardCharsets.UTF_8)
//result is : 61 62 63
//length: 3 bytes

Byte[] a_byte = sample .getBytes(StandardCharsets.UTF_16BE)  
//result is : 00 61 00 62 00 63        
//length: 6 bytes

In order to get the oringle byte of the String. 为了获得String的oringle字节。 We can just read the Memory of the string and get Each byte of the String.Below is the sample Code: 我们可以只读取字符串的内存并获取String.Below的每个字节是示例代码:

public static byte[] charArray2ByteArray(char[] chars){
    int length = chars.length;
    byte[] result = new byte[length*2+2];
    int i = 0;
    for(int j = 0 ;j<chars.length;j++){
        result[i++] = (byte)( (chars[j] & 0xFF00) >> 8 );
        result[i++] = (byte)((chars[j] & 0x00FF)) ;
    }
    return result;
}

Usages: 用途:

String sample = "abc";
//First get the chars of the String,each char has two bytes(Java).
Char[] sample_chars = sample.toCharArray();
//Get the bytes
byte[] result = charArray2ByteArray(sample_chars).

//Back to String.
//Make sure we use UTF_16BE. Because we read the memory of Unicode of  
//the String from Left to right. That's the same reading 
//sequece of  UTF-16BE.
String sample_back= new String(result , StandardCharsets.UTF_16BE);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM