[英]Converting ByteArray to string and back produces different string
I have to store huge list of booleans and I chose to store them as byte array as string. 我必须存储大量的布尔值,我选择将它们作为字节数组存储为字符串。 But I can't understand, why converting to string and back produces different string values:
但我不明白,为什么转换为字符串再返回会产生不同的字符串值:
Support methods: 支持方式:
fun ByteArray.string(): String {
var str = ""
this.reversed().forEach {
str += intToString(it, 4)
}
return str
}
fun intToString(number: Byte, groupSize: Int): String {
val result = StringBuilder()
for (i in 7 downTo 0) {
val mask = 1 shl i
result.append(if (number.toInt() and mask != 0) "1" else "0")
if (i % groupSize == 0)
result.append(" ")
}
result.replace(result.length - 1, result.length, "")
return result.toString()
}
First example: 第一个例子:
Given selected indices [0, 14] my code converts to: as bytes: [1, 64]. 给定选定的索引[0,14],我的代码将转换为:作为字节:[1,64]。
.string()
produces: .string()
产生:
0100 0000 0000 0001
0100 0000 0000 0001
Convert it to string and back: 将其转换为字符串并返回:
array.toString(Charsets.UTF_8).toByteArray(Charsets.UTF_8)
Result: [1, 64], .string()
produces: 结果:[
.string()
] .string()
产生:
0100 0000 0000 0001
0100 0000 0000 0001
Second example: 第二个例子:
Given selected indices [0, 15] my code converts to: as bytes: [1,-128]. 给定选定的索引[0,15],我的代码将转换为:作为字节:[1,-128]。
.string()
produces: .string()
产生:
1000 0000 0000 0001
1000 0000 0000 0001
Which seems pretty legal. 这似乎很合法。 Now convert it to the string and back
现在将其转换为字符串并返回
It produces an array of 4 bytes: [1, -17, -65, -67], .string()
produces: 它产生4个字节的数组:[1,-17,-65,-67]
.string()
产生:
1011 1101 1011 1111 1110 1111 0000 0001
1011 1101 1011 1111 1110 1111 0000 0001
Which doesn't look like [0, 15] indices or [1,-128] for me :) 对我来说,它看起来不像[0,15]索引或[1,-128] :)
How can this happen? 怎么会这样 I suspect this last "1" in "1000 0000 0000 0001", probably it may cause this issue, but still, I don't know the answer.
我怀疑“ 1000 0000 0000 0001”中的最后一个“ 1”,可能会导致此问题,但是我仍然不知道答案。
Thanks. 谢谢。
PS Added java
tag to the question, because I think the answer is the same for both kotlin and java. PS为问题添加了
java
标记,因为我认为kotlin和java的答案都是相同的。
Here's a MCVE for your problem (in Java): 这是针对您的问题的MCVE(使用Java):
import java.nio.charset.*;
class Test {
public static void main(String[] args) {
byte[] array = { -128 };
byte[] convertedArray = new String(array, StandardCharsets.UTF_8).getBytes(StandardCharsets.UTF_8);
for(int i=0; i<convertedArray.length; i++) {
System.out.println(convertedArray[i]);
}
}
}
Expected output: 预期产量:
-128
Actual output: 实际输出:
-17
-65
-67
This happens because the byte -128
is not a valid UTF-8 character, so it gets replaced with the Unicode replacement character U+FFFD " ". 发生这种情况是因为字节
-128
不是有效的UTF-8字符,因此将其替换为Unicode替换字符U + FFFD“。”。
You can instead encode and decode the string as ISO-8859-1 aka Latin1, since all byte strings are valid in the ISO-8859 family of encodings. 您可以改为将字符串编码和解码为ISO-8859-1,也就是Latin1,因为所有字节字符串在ISO-8859编码家族中均有效。 ISO-8859-1 has the convenient property that each byte value corresponds directly to the same unicode code point, so that
0x80
is encoded as U+0080, 0xFF
as U+00FF etc. ISO-8859-1具有每个字节值直接对应于相同的Unicode代码点,从而方便属性
0x80
被编码为U + 0080, 0xFF
如U + 00FF等
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.