简体   繁体   English

求解释 Long -> Byte Array -> String -> Byte Array -> Long

[英]Seeking explanation Long -> Byte Array -> String -> Byte Array -> Long

I'm seeking an explanation for some oddity I've seen in someone elses code, they were retrieving an "int64" value from a third party library reading from an LDAP attribute, this library returned a byte array.我正在寻找我在别人的代码中看到的一些奇怪的解释,他们正在从第三方库中检索“int64”值,该库从 LDAP 属性读取,该库返回了一个字节数组。 To get the value they were trying something like为了获得他们正在尝试的价值

String s = new String(bytesFrom3rdParty);
BigInteger i = new BigInteger(s.getBytes());
System.out.println(i.toString());

With some long values this gave incorrect output that wasn't expected.对于一些长值,这给出了不正确的 output,这是意料之外的。 To me there were two things that stood out对我来说,有两件事很突出

  1. Why go from byte array -> String -> Bytes -> BigInteger为什么 go 来自字节数组 -> 字符串 -> 字节 -> BigInteger
  2. Why use a BigInteger for a 64 bit numeric value.为什么将 BigInteger 用于 64 位数值。

Anyway I did a little experiment无论如何我做了一个小实验

private static byte[] longToByteArray(Long l) {
    return ByteBuffer.allocate(Long.SIZE / Byte.SIZE).putLong(l).array();
}

private static Long byteArrayToLong(byte[] bytes) {
    return ByteBuffer.wrap(bytes).getLong();
}

public static void main(String[] args) {
    
    for (long l = 0L; l < 1000; l++) {
        byte[] origBytes = longToByteArray(l);
        String s = new String(origBytes);
        byte[] stringBytes = s.getBytes();
        Long origL = byteArrayToLong(origBytes);
        Long stringL = byteArrayToLong(stringBytes);
        System.out.println(origL.toString() + " " + stringL.toString());
    }
    
}

As I suspected skipping the conversion to string then back to a byte array fixed the issue, the output from the above is something like正如我怀疑跳过转换为字符串然后回到字节数组修复了问题,上面的 output 类似于

124 124
125 125
126 126
127 127
128 239
129 239
130 239
131 239
132 239

And then the right hand value corrects itself again when it hits 256然后右手值在达到 256 时再次自我修正

254 239
255 239
256 256
257 257
258 258
259 259
260 260
261 261
262 262
263 263
264 264

So a couple of questions from me所以我有几个问题

  1. Why is the right hand value wrong?为什么右手值是错误的? I assume it's something to do with conversion between a 64 bit long value to a 32 bit String value?我认为这与 64 位长值到 32 位字符串值之间的转换有关?
  2. Why doesn't the incorrect value change until the value of l gets to 256?为什么直到l的值达到 256 才改变不正确的值?

byte[] can be different things, for example: byte[]可以是不同的东西,例如:

  • serialized String value (UTF-8 encoding for example) "123" -> bytes representing string, which actually encodes every character with 2 bytes序列化的字符串值(例如 UTF-8 编码)“123”-> 表示字符串的字节,它实际上将每个字符编码为 2 个字节
  • serialized Long value in binary 123 -> 8 bytes representing one number二进制的序列化 Long 值123 -> 8 个字节,代表一个数字

So when it makes sense to convert byte[] to String is when you are actually getting String in byte[], and after that you are parsing the String into number (in your case BigInteger).因此,何时将 byte[] 转换为 String 是有意义的,即您实际上在 byte[] 中获取 String,然后将 String 解析为数字(在您的情况下为 BigInteger)。 Going back to bytes doesn't make much sense to me.回到字节对我来说没有多大意义。

String s = new String(bytesFrom3rdParty); // binary from UTF-8 string
BigInteger i = new BigInteger(s); // parse String "123" to BigInteger
System.out.println(i.toString()); // now i will be 123 in BigInteger

This will work too:这也可以:

String s = new String(bytesFrom3rdParty); // binary from UTF-8 string
Long i = Long.parseLong(s); // parse String "123" to Long
System.out.println(i.toString()); // now i will be 123 in Long

What you are doing in your example is second case, you are serializing Long in binary form to byte[] (not UTF-8 string).您在示例中所做的是第二种情况,您正在将二进制形式的 Long 序列化为byte[] (不是 UTF-8 字符串)。 Then you are making a string of that binary data and getting bytes.然后你正在制作一个二进制数据字符串并获取字节。 What happens is due to conversion to Java's backing implementation of Charset - it expects it to be valid Charset encoding it changes your binary representation to something that fits Charset encoding.发生的事情是由于转换为 Java 的 Charset 支持实现 - 它期望它是有效的 Charset 编码,它将您的二进制表示更改为适合 Charset 编码的东西。

When you try to retrive it back and build Long from it breaks, why 128. Probably up to 127 (old ASCII standard had this many characters) your binary representation somehow fits Java's charset encoding, but after it breaks.当您尝试将其取回并从中构建 Long 时,为什么会中断,为什么是 128。可能多达 127 个(旧的 ASCII 标准有这么多字符)您的二进制表示在某种程度上适合 Java 的字符集编码,但在它中断之后。

  • serialized String value should be parsed Long.parseFrom(String) or new BigInteger(String)序列化的 String 值应该被解析Long.parseFrom(String)new BigInteger(String)
  • binary Serialized number should be binary read ByteBuffer.getLong() binary 序列号应该是二进制读取ByteBuffer.getLong()

Lets make it a little simpler, byte[] -> String -> byte[] is performing an encoding and a decode.让我们让它更简单一点, byte[] -> String -> byte[] 正在执行编码和解码。 When you use new String(byte[] b) it will:当您使用new String(byte[] b)时,它将:

Constructs a new String by decoding the specified array of bytes using the platform's default charset.通过使用平台的默认字符集解码指定的字节数组来构造一个新的字符串。

What happens if the character is not in your platform default character set?如果该字符不在您的平台默认字符集中会怎样?

The behavior of this constructor when the given bytes are not valid in the default charset is unspecified.当给定字节在默认字符集中无效时,此构造函数的行为未指定。

So, in your situation, when an invalid byte is passed it converts the character to 65533 the java replacement character.因此,在您的情况下,当传递无效字节时,它将字符转换为 65533 java 替换字符。

byte[] b = {-1};
System.out.println( Arrays.toString( new String(b).getBytes() ) );

[-17, -65, -67] [-17, -65, -67]

That is why the value doesn't change, they're all mapped to the replacement character.这就是为什么值不会改变,它们都映射到替换字符。

You might use BigInteger simple access to a constructor that takes byte[] to create a long.您可以使用 BigInteger 对构造函数的简单访问,该构造函数需要byte[]来创建 long。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM