将char [2]转换为unsigned short时出错？

Question

Edit: 编辑：

After reading the comments, thanks to @MM and @AnttiHaapala I fixed my code but still got incorrect outputs... 阅读评论后，感谢@MM和@AnttiHaapala，我修复了代码，但仍然得到错误的输出...

New Code: 新代码：

#include <iostream>
int main() {
    char * myChar;
    myChar = new char[2];
    myChar[1] = 0x00;
    myChar[0] = 0xE0;
    unsigned short myShort;
    myShort = ((myChar[1] << 8) | (myChar[0]));
    std::cout << myShort << std::endl;
    return 0;
}

Output: 输出：

or if you reverse the order 或者如果您撤销订单

Old Post: 旧帖子：

So I have a two byte value that I am reading from a file and would like to convert to a unsigned short so I can use the numerical value. 因此，我有一个从文件中读取的两个字节的值，想要转换为无符号的short，以便可以使用数值。

Example code: 示例代码：

#include <iostream>
int main() {
    char myChar[2];
    myChar[1] = 'à';
    myChar[0] = '\0';
    unsigned short myShort;
    myShort = ((myChar[1] << 8) | (myChar[0]));
    std::cout << myShort << std::endl;
    return 0;
}

Output: 输出：

But à\\0 or E0 00 should have a value of 224 as an unsigned two byte value? 但是à\\0或E0 00应该具有224的值作为无符号的两个字节的值吗？

Also very interesting... 也很有趣...

This code: 这段代码：

include <iostream>
int main() {
    char * myChar;
    myChar = "\0à";
    unsigned short myShort;
    myShort = ((myChar[1] << 8) | (myChar[0]));
    std::cout << myShort << std::endl;
    return 0;
}

Outputs: 输出：

Answer 1

NOTE: The original code has a complicating factor in that the source is UTF-8 encoded. 注意：原始代码有一个复杂的因素，因为源是UTF-8编码的。 Please check edit history of this answer to see my comments on that. 请检查此答案的编辑历史记录，以查看我对此的评论。 However I think that is not the main issue you are asking about, so I have changed my answer to just address the edit. 但是，我认为这不是您要问的主要问题，因此我更改了答案，只解决了修改问题。 To avoid UTF-8 conversion issues, use '\\xE0' instead of 'à' . 为避免UTF-8转换问题，请使用'\\xE0'而不是'à' 。

Regarding the edited code: 关于编辑后的代码：

char * myChar;
myChar = new char[2];
myChar[1] = 0x00;
myChar[0] = 0xE0;
unsigned short myShort;
myShort = ((myChar[1] << 8) | (myChar[0]));
std::cout << myShort << std::endl;

The range of char (on your system) is -128 through to 127 . char的范围（在您的系统上）是-128到127 。 This is common. 这很常见。 You write myChar[0] = 224; 您编写myChar[0] = 224; . 。 ( 0xE0 is an int literal with value 224 ). （ 0xE0是一个具有值224的int文字）。

This is an out of range conversion , which causes implementation-defined behaviour . 这是超出范围的转换 ，导致实现定义的行为 。 Most commonly, implementations will define this to adjust modulo 256 until the value is in range. 最常见的是，实现会将其定义为以256为模，直到该值在范围内。 So you end up with the same result as: 因此，您最终得到与以下结果相同的结果：

myChar[0] = -32;

Then the calculation (myChar[1] << 8) | myChar[0] 然后计算(myChar[1] << 8) | myChar[0] (myChar[1] << 8) | myChar[0] is 0 | (-32) (myChar[1] << 8) | myChar[0]为0 | (-32) 0 | (-32) , which is -32 . 0 | (-32) ，即-32 。 Finally, you convert to unsigned short . 最后，您将转换为unsigned short 。 This is another out-of-range conversion , because the range of unsigned short is [0, 65535] on your system. 这是另一次超出范围的转换 ，因为系统上unsigned short的范围是[0, 65535] 。

However, out-of-range conversion to unsigned type is well-defined to adjust modulo 65536 in this case, so the result is 65536 - 32 = 65504 . 但是，在这种情况下，为了将模数调整为65536 ，定义为无符号类型的超范围转换是明确定义的，因此结果为65536-32 = 65504 。

Reversing the order performs ((-32) << 8) | 0 颠倒顺序执行((-32) << 8) | 0 ((-32) << 8) | 0 . ((-32) << 8) | 0 。 Left-shifting a negative value causes undefined behaviour , although on your system it has manifested itself as doing -32 * 256 , giving -8192 . 左移负值会导致不确定的行为 ，尽管在您的系统上它表现为-32 * 256 ，为-8192 。 Converting that to unsigned short gives 65536 - 8192 = 57344 . 将其转换为unsigned short会得到57344 = 57344 。

If you are trying to get 224 from the first example, the simplest way to do this is to use unsigned char instead of char . 如果您尝试从第一个示例中获取224 ，最简单的方法是使用unsigned char而不是char 。 Then myChar[0] will hold the value 224 instead of the value -32 . 然后， myChar[0]将保留值224而不是值-32 。

Answer 2

Use unsigned types for bit level manipulation. 使用无符号类型进行位级别操作。

For example, on a computer with 8-bit byte, and where char is signed, myChar[0] = 0xE0 results in a negative value. 例如，在具有8位字节且对char进行签名的计算机上， myChar[0] = 0xE0得出负值。 Which is sign extended when it's used in an expression. 在表达式中使用时会扩展符号 。

Conversely, to avoid problems, use signed types for numbers. 相反，为避免出现问题，请对数字使用带符号的类型。

Answer 3

When you store the character into myChar , you're storing it big-endian: The high byte first, then the low byte. 当您将字符存储到myChar ，您将其存储为big-endian：首先是高字节，然后是低字节。 When you read the individual bytes out, you are reading them as little-endian: low byte first, high byte second (shifted by 8, or multiplied by 256). 当您读取各个字节时，您将它们读取为低位字节序：低位在前，高位在后（移位8，或乘以256）。 This is why you get such a large value. 这就是为什么您获得如此高的价值。

myShort = (myChar[0] * 256) + myChar[1];

will give you the expected answer. 将给您预期的答案。

将char [2]转换为unsigned short时出错？

问题描述

Edit: 编辑：

Old Post: 旧帖子：

3 个解决方案

解决方案1
3 已采纳 2016-03-25 05:21:28

解决方案2
1 2016-03-25 05:38:14

解决方案3
0 2016-03-25 05:21:38

将char [2]转换为unsigned short时出错？

问题描述

Edit: 编辑：

Old Post: 旧帖子：

3 个解决方案

解决方案1 3 已采纳 2016-03-25 05:21:28

解决方案2 1 2016-03-25 05:38:14

解决方案3 0 2016-03-25 05:21:38

解决方案1
3 已采纳 2016-03-25 05:21:28

解决方案2
1 2016-03-25 05:38:14

解决方案3
0 2016-03-25 05:21:38