简体   繁体   English

将char [2]转换为unsigned short时出错?

[英]Error converting char[2] to unsigned short?

Edit: 编辑:

After reading the comments, thanks to @MM and @AnttiHaapala I fixed my code but still got incorrect outputs... 阅读评论后,感谢@MM和@AnttiHaapala,我修复了代码,但仍然得到错误的输出...

New Code: 新代码:

#include <iostream>
int main() {
    char * myChar;
    myChar = new char[2];
    myChar[1] = 0x00;
    myChar[0] = 0xE0;
    unsigned short myShort;
    myShort = ((myChar[1] << 8) | (myChar[0]));
    std::cout << myShort << std::endl;
    return 0;
}

Output: 输出:

65504

or if you reverse the order 或者如果您撤销订单

57344

Old Post: 旧帖子:

So I have a two byte value that I am reading from a file and would like to convert to a unsigned short so I can use the numerical value. 因此,我有一个从文件中读取的两个字节的值,想要转换为无符号的short,以便可以使用数值。

Example code: 示例代码:

#include <iostream>
int main() {
    char myChar[2];
    myChar[1] = 'à';
    myChar[0] = '\0';
    unsigned short myShort;
    myShort = ((myChar[1] << 8) | (myChar[0]));
    std::cout << myShort << std::endl;
    return 0;
}

Output: 输出:

40960

But à\\0 or E0 00 should have a value of 224 as an unsigned two byte value? 但是à\\0E0 00应该具有224的值作为无符号的两个字节的值吗?

Also very interesting... 也很有趣...

This code: 这段代码:

include <iostream>
int main() {
    char * myChar;
    myChar = "\0à";
    unsigned short myShort;
    myShort = ((myChar[1] << 8) | (myChar[0]));
    std::cout << myShort << std::endl;
    return 0;
}

Outputs: 输出:

49920

NOTE: The original code has a complicating factor in that the source is UTF-8 encoded. 注意:原始代码有一个复杂的因素,因为源是UTF-8编码的。 Please check edit history of this answer to see my comments on that. 请检查此答案的编辑历史记录,以查看我对此的评论。 However I think that is not the main issue you are asking about, so I have changed my answer to just address the edit. 但是,我认为这不是您要问的主要问题,因此我更改了答案,只解决了修改问题。 To avoid UTF-8 conversion issues, use '\\xE0' instead of 'à' . 为避免UTF-8转换问题,请使用'\\xE0'而不是'à'


Regarding the edited code: 关于编辑后的代码:

char * myChar;
myChar = new char[2];
myChar[1] = 0x00;
myChar[0] = 0xE0;
unsigned short myShort;
myShort = ((myChar[1] << 8) | (myChar[0]));
std::cout << myShort << std::endl;

The range of char (on your system) is -128 through to 127 . char的范围(在您的系统上)是-128127 This is common. 这很常见。 You write myChar[0] = 224; 您编写myChar[0] = 224; . ( 0xE0 is an int literal with value 224 ). 0xE0是一个具有值224int文字)。

This is an out of range conversion , which causes implementation-defined behaviour . 这是超出范围的转换 ,导致实现定义的行为 Most commonly, implementations will define this to adjust modulo 256 until the value is in range. 最常见的是,实现会将其定义为以256为模,直到该值在范围内。 So you end up with the same result as: 因此,您最终得到与以下结果相同的结果:

myChar[0] = -32;

Then the calculation (myChar[1] << 8) | myChar[0] 然后计算(myChar[1] << 8) | myChar[0] (myChar[1] << 8) | myChar[0] is 0 | (-32) (myChar[1] << 8) | myChar[0]0 | (-32) 0 | (-32) , which is -32 . 0 | (-32) ,即-32 Finally, you convert to unsigned short . 最后,您将转换为unsigned short This is another out-of-range conversion , because the range of unsigned short is [0, 65535] on your system. 这是另一次超出范围的转换 ,因为系统上unsigned short的范围是[0, 65535]

However, out-of-range conversion to unsigned type is well-defined to adjust modulo 65536 in this case, so the result is 65536 - 32 = 65504 . 但是,在这种情况下,为了将模数调整为65536 ,定义为无符号类型的超范围转换是明确定义的,因此结果为65536-32 = 65504


Reversing the order performs ((-32) << 8) | 0 颠倒顺序执行((-32) << 8) | 0 ((-32) << 8) | 0 . ((-32) << 8) | 0 Left-shifting a negative value causes undefined behaviour , although on your system it has manifested itself as doing -32 * 256 , giving -8192 . 左移负值会导致不确定的行为 ,尽管在您的系统上它表现为-32 * 256 ,为-8192 Converting that to unsigned short gives 65536 - 8192 = 57344 . 将其转换为unsigned short会得到57344 = 57344


If you are trying to get 224 from the first example, the simplest way to do this is to use unsigned char instead of char . 如果您尝试从第一个示例中获取224 ,最简单的方法是使用unsigned char而不是char Then myChar[0] will hold the value 224 instead of the value -32 . 然后, myChar[0]将保留值224而不是值-32

Use unsigned types for bit level manipulation. 使用无符号类型进行位级别操作。

For example, on a computer with 8-bit byte, and where char is signed, myChar[0] = 0xE0 results in a negative value. 例如,在具有8位字节且对char进行签名的计算机上, myChar[0] = 0xE0得出负值。 Which is sign extended when it's used in an expression. 在表达式中使用时会扩展符号

Conversely, to avoid problems, use signed types for numbers. 相反,为避免出现问题,请对数字使用带符号的类型。

When you store the character into myChar , you're storing it big-endian: The high byte first, then the low byte. 当您将字符存储到myChar ,您将其存储为big-endian:首先是高字节,然后是低字节。 When you read the individual bytes out, you are reading them as little-endian: low byte first, high byte second (shifted by 8, or multiplied by 256). 当您读取各个字节时,您将它们读取为低位字节序:低位在前,高位在后(移位8,或乘以256)。 This is why you get such a large value. 这就是为什么您获得如此高的价值。

myShort = (myChar[0] * 256) + myChar[1];

will give you the expected answer. 将给您预期的答案。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM