[英]Hex-string to UTF-8-string in Java
I have a number of hex: 35 d8 de de de de 43 f2 71 84 4b f3 be 4d 4d 65 4a 17 41 bb 40 a5 85 c4 bd fd 7a 4e fb 24 27 4e我有一个十六进制数:35 d8 de de de de 43 f2 71 84 4b f3 be 4d 4d 65 4a 17 41 bb 40 a5 85 c4 bd fd 7a 4e fb 24 27 4e
This is 32 bytes!这是 32 个字节!
I do this:我这样做:
String b = "35d8dededede43f271844bf3be4d4d654a1741bb40a585c4bdfd7a4efb24274e";
byte[] bytes = fromHex(b);
String st = new String(bytes, StandardCharsets.UTF_8);
System.out.println(bytes.length); // 32
System.out.println(st.length()); // 30
private static byte[] fromHex(String hex)
{
byte[] binary = new byte[hex.length() / 2];
for(int i = 0; i < binary.length; i++)
{
binary[i] = (byte)Integer.parseInt(hex.substring(2*i, 2*i+2), 16);
}
return binary;
}
And I get an answer:我得到了一个答案:
32
30
But I expect to get a 32 UTF-8 character string!但我希望得到一个 32 个 UTF-8 字符串! Why do I get a 30 character string?为什么我得到一个 30 个字符的字符串? How can I get 32 UTF-8 bytes?如何获得 32 个 UTF-8 字节?
Why do I get a 30 character string?为什么我得到一个 30 个字符的字符串?
There are byte sequences in that string such that multiple bytes are converted to a single Unicode codepoint when decoding from UTF-8.该字符串中有字节序列,因此当从 UTF-8 解码时,多个字节被转换为单个 Unicode 代码点。
How can I get 32 UTF-8 bytes.我怎样才能得到 32 个 UTF-8 字节。
We can't.我们不能。 It's a 30-character UTF-8 string?它是一个 30 个字符的 UTF-8 字符串?
And it's wrong anyway to say "UTF-8 bytes".无论如何说“UTF-8 字节”是错误的。 They're not bytes any more.它们不再是字节了。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.