在字节数组中解析Unicode

Question

I have a byte array with a series of characters. 我有一个带有一系列字符的字节数组。 In one case I have 在一种情况下，我有

[28] = 0x6e
[29] = 0x61
[30] = 0x6d
[31] = 0x65
[32] = 0x00
[33] = 0x00
[34] = 0x00
[35] = 0x4f
[36] = 0x08
[37] = 0x00
[38] = 0x07
[39] = 0x00
[40] = 0x00
[41] = 0x04
[42] = 0x13
[43] = 0xff
[44] = 0xff
[45] = 0x00
[46] = 0x00

28 to 31 has the characters "name" with that section ending on element 32. Then I have another byte array: 28到31具有字符“ name”，该部分以元素32结尾。然后我有另一个字节数组：

[47] = 0x01
[48] = 0x03
[49] = 0x00
[50] = 0x00
[51] = 0x73
[52] = 0x65
[53] = 0xc3
[54] = 0xb1
[55] = 0x6f
[56] = 0x72
[57] = 0x00
[58] = 0x00
[59] = 0x00
[60] = 0x4f
[61] = 0x08
[62] = 0x00
[63] = 0x08
[64] = 0x00
[65] = 0x00
[66] = 0x04
[67] = 0x13
[68] = 0xff
[69] = 0xff
[70] = 0x00
[71] = 0x00

where I believe I have the string señor . 我相信我那里有琴弦señor 。

With the first array it's easy to find the name as the first 4 bytes with 00 as a terminator but how do I decipher whats on the second byte array? 使用第一个数组可以很容易地找到名称，其中前4个字节以00作为终止符，但是如何解密第二个字节数组上的内容呢？

Both arrays are vector<char> s. 这两个数组都是vector<char> 。

Answer 1

The text is obviously using UTF-8 encoding: 该文本显然使用UTF-8编码：

[53] = 0xc3
[54] = 0xb1

This is the UTF-8 encoded ñ character. 这是UTF-8编码的 ñ字符。 And the surrounding characters are the remaining four characters in señor . 周围的字符是señor其余的四个字符。

The C++ library does have some facilities for working with UTF-8 ; C ++库确实具有一些用于UTF-8的工具； but I always found those library classes somewhat awkward and inflexible. 但是我总是发现那些库类有些笨拙和僵化。 On most platforms, you have an excellent, flexible iconv library with a simple, easy API for converting between UTF-8 and other encodings. 在大多数平台上，您都有一个出色的，灵活的iconv库，它带有一个简单的API，可以在UTF-8和其他编码之间进行转换。

在字节数组中解析Unicode

问题描述

1 个解决方案

解决方案1
1 已采纳 2016-12-20 18:20:22

在字节数组中解析Unicode

问题描述

1 个解决方案

解决方案1 1 已采纳 2016-12-20 18:20:22

解决方案1
1 已采纳 2016-12-20 18:20:22