简体   繁体   English

Java 中的十六进制字符串到 UTF-8 字符串

[英]Hex-string to UTF-8-string in Java

I have a number of hex: 35 d8 de de de de 43 f2 71 84 4b f3 be 4d 4d 65 4a 17 41 bb 40 a5 85 c4 bd fd 7a 4e fb 24 27 4e我有一个十六进制数:35 d8 de de de de 43 f2 71 84 4b f3 be 4d 4d 65 4a 17 41 bb 40 a5 85 c4 bd fd 7a 4e fb 24 27 4e

This is 32 bytes!这是 32 个字节!

I do this:我这样做:

String b = "35d8dededede43f271844bf3be4d4d654a1741bb40a585c4bdfd7a4efb24274e";
    byte[] bytes = fromHex(b);
    String st = new String(bytes, StandardCharsets.UTF_8);
    System.out.println(bytes.length);   // 32
    System.out.println(st.length());    // 30

  private static byte[] fromHex(String hex)
{
    byte[] binary = new byte[hex.length() / 2];
    for(int i = 0; i < binary.length; i++)
    {
        binary[i] = (byte)Integer.parseInt(hex.substring(2*i, 2*i+2), 16);
    }
    return binary;
}

And I get an answer:我得到了一个答案:

32
30

But I expect to get a 32 UTF-8 character string!但我希望得到一个 32 个 UTF-8 字符串! Why do I get a 30 character string?为什么我得到一个 30 个字符的字符串? How can I get 32 UTF-8 bytes?如何获得 32 个 UTF-8 字节?

Why do I get a 30 character string?为什么我得到一个 30 个字符的字符串?

There are byte sequences in that string such that multiple bytes are converted to a single Unicode codepoint when decoding from UTF-8.该字符串中有字节序列,因此当从 UTF-8 解码时,多个字节被转换为单个 Unicode 代码点。

How can I get 32 UTF-8 bytes.我怎样才能得到 32 个 UTF-8 字节。

We can't.我们不能。 It's a 30-character UTF-8 string?它是一个 30 个字符的 UTF-8 字符串?

And it's wrong anyway to say "UTF-8 bytes".无论如何说“UTF-8 字节”是错误的。 They're not bytes any more.它们不再是字节了。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM