Java 中的十六进制字符串到 UTF-8 字符串

Question

I have a number of hex: 35 d8 de de de de 43 f2 71 84 4b f3 be 4d 4d 65 4a 17 41 bb 40 a5 85 c4 bd fd 7a 4e fb 24 27 4e我有一个十六进制数：35 d8 de de de de 43 f2 71 84 4b f3 be 4d 4d 65 4a 17 41 bb 40 a5 85 c4 bd fd 7a 4e fb 24 27 4e

This is 32 bytes!这是 32 个字节！

I do this:我这样做：

String b = "35d8dededede43f271844bf3be4d4d654a1741bb40a585c4bdfd7a4efb24274e";
    byte[] bytes = fromHex(b);
    String st = new String(bytes, StandardCharsets.UTF_8);
    System.out.println(bytes.length);   // 32
    System.out.println(st.length());    // 30

  private static byte[] fromHex(String hex)
{
    byte[] binary = new byte[hex.length() / 2];
    for(int i = 0; i < binary.length; i++)
    {
        binary[i] = (byte)Integer.parseInt(hex.substring(2*i, 2*i+2), 16);
    }
    return binary;
}

And I get an answer:我得到了一个答案：

32
30

But I expect to get a 32 UTF-8 character string!但我希望得到一个 32 个 UTF-8 字符串！ Why do I get a 30 character string?为什么我得到一个 30 个字符的字符串？ How can I get 32 UTF-8 bytes?如何获得 32 个 UTF-8 字节？

Answer 1

Why do I get a 30 character string?为什么我得到一个 30 个字符的字符串？

There are byte sequences in that string such that multiple bytes are converted to a single Unicode codepoint when decoding from UTF-8.该字符串中有字节序列，因此当从 UTF-8 解码时，多个字节被转换为单个 Unicode 代码点。

How can I get 32 UTF-8 bytes.我怎样才能得到 32 个 UTF-8 字节。

We can't.我们不能。 It's a 30-character UTF-8 string?它是一个 30 个字符的 UTF-8 字符串？

And it's wrong anyway to say "UTF-8 bytes".无论如何说“UTF-8 字节”是错误的。 They're not bytes any more.它们不再是字节了。

Java 中的十六进制字符串到 UTF-8 字符串

问题描述

1 个解决方案

解决方案1
2 已采纳 2020-01-15 22:27:27

Java 中的十六进制字符串到 UTF-8 字符串

问题描述

1 个解决方案

解决方案1 2 已采纳 2020-01-15 22:27:27

解决方案1
2 已采纳 2020-01-15 22:27:27