[英]Error converting char[2] to unsigned short?
After reading the comments, thanks to @MM and @AnttiHaapala I fixed my code but still got incorrect outputs... 阅读评论后,感谢@MM和@AnttiHaapala,我修复了代码,但仍然得到错误的输出...
New Code: 新代码:
#include <iostream>
int main() {
char * myChar;
myChar = new char[2];
myChar[1] = 0x00;
myChar[0] = 0xE0;
unsigned short myShort;
myShort = ((myChar[1] << 8) | (myChar[0]));
std::cout << myShort << std::endl;
return 0;
}
Output: 输出:
65504
or if you reverse the order 或者如果您撤销订单
57344
So I have a two byte value that I am reading from a file and would like to convert to a unsigned short so I can use the numerical value. 因此,我有一个从文件中读取的两个字节的值,想要转换为无符号的short,以便可以使用数值。
Example code: 示例代码:
#include <iostream>
int main() {
char myChar[2];
myChar[1] = 'à';
myChar[0] = '\0';
unsigned short myShort;
myShort = ((myChar[1] << 8) | (myChar[0]));
std::cout << myShort << std::endl;
return 0;
}
Output: 输出:
40960
But à\\0
or E0 00
should have a value of 224 as an unsigned two byte value? 但是à\\0
或E0 00
应该具有224的值作为无符号的两个字节的值吗?
Also very interesting... 也很有趣...
This code: 这段代码:
include <iostream>
int main() {
char * myChar;
myChar = "\0à";
unsigned short myShort;
myShort = ((myChar[1] << 8) | (myChar[0]));
std::cout << myShort << std::endl;
return 0;
}
Outputs: 输出:
49920
NOTE: The original code has a complicating factor in that the source is UTF-8 encoded. 注意:原始代码有一个复杂的因素,因为源是UTF-8编码的。 Please check edit history of this answer to see my comments on that. 请检查此答案的编辑历史记录,以查看我对此的评论。 However I think that is not the main issue you are asking about, so I have changed my answer to just address the edit. 但是,我认为这不是您要问的主要问题,因此我更改了答案,只解决了修改问题。 To avoid UTF-8 conversion issues, use '\\xE0'
instead of 'à'
. 为避免UTF-8转换问题,请使用'\\xE0'
而不是'à'
。
Regarding the edited code: 关于编辑后的代码:
char * myChar;
myChar = new char[2];
myChar[1] = 0x00;
myChar[0] = 0xE0;
unsigned short myShort;
myShort = ((myChar[1] << 8) | (myChar[0]));
std::cout << myShort << std::endl;
The range of char
(on your system) is -128
through to 127
. char
的范围(在您的系统上)是-128
到127
。 This is common. 这很常见。 You write myChar[0] = 224;
您编写myChar[0] = 224;
. 。 ( 0xE0
is an int
literal with value 224
). ( 0xE0
是一个具有值224
的int
文字)。
This is an out of range conversion , which causes implementation-defined behaviour . 这是超出范围的转换 ,导致实现定义的行为 。 Most commonly, implementations will define this to adjust modulo 256 until the value is in range. 最常见的是,实现会将其定义为以256为模,直到该值在范围内。 So you end up with the same result as: 因此,您最终得到与以下结果相同的结果:
myChar[0] = -32;
Then the calculation (myChar[1] << 8) | myChar[0]
然后计算(myChar[1] << 8) | myChar[0]
(myChar[1] << 8) | myChar[0]
is 0 | (-32)
(myChar[1] << 8) | myChar[0]
为0 | (-32)
0 | (-32)
, which is -32
. 0 | (-32)
,即-32
。 Finally, you convert to unsigned short
. 最后,您将转换为unsigned short
。 This is another out-of-range conversion , because the range of unsigned short
is [0, 65535]
on your system. 这是另一次超出范围的转换 ,因为系统上unsigned short
的范围是[0, 65535]
。
However, out-of-range conversion to unsigned type is well-defined to adjust modulo 65536
in this case, so the result is 65536 - 32 = 65504
. 但是,在这种情况下,为了将模数调整为65536
,定义为无符号类型的超范围转换是明确定义的,因此结果为65536-32 = 65504
。
Reversing the order performs ((-32) << 8) | 0
颠倒顺序执行((-32) << 8) | 0
((-32) << 8) | 0
. ((-32) << 8) | 0
。 Left-shifting a negative value causes undefined behaviour , although on your system it has manifested itself as doing -32 * 256
, giving -8192
. 左移负值会导致不确定的行为 ,尽管在您的系统上它表现为-32 * 256
,为-8192
。 Converting that to unsigned short
gives 65536 - 8192 = 57344
. 将其转换为unsigned short
会得到57344
= 57344
。
If you are trying to get 224
from the first example, the simplest way to do this is to use unsigned char
instead of char
. 如果您尝试从第一个示例中获取224
,最简单的方法是使用unsigned char
而不是char
。 Then myChar[0]
will hold the value 224
instead of the value -32
. 然后, myChar[0]
将保留值224
而不是值-32
。
Use unsigned types for bit level manipulation. 使用无符号类型进行位级别操作。
For example, on a computer with 8-bit byte, and where char
is signed, myChar[0] = 0xE0
results in a negative value. 例如,在具有8位字节且对char
进行签名的计算机上, myChar[0] = 0xE0
得出负值。 Which is sign extended when it's used in an expression. 在表达式中使用时会扩展符号 。
Conversely, to avoid problems, use signed types for numbers. 相反,为避免出现问题,请对数字使用带符号的类型。
When you store the character into myChar
, you're storing it big-endian: The high byte first, then the low byte. 当您将字符存储到myChar
,您将其存储为big-endian:首先是高字节,然后是低字节。 When you read the individual bytes out, you are reading them as little-endian: low byte first, high byte second (shifted by 8, or multiplied by 256). 当您读取各个字节时,您将它们读取为低位字节序:低位在前,高位在后(移位8,或乘以256)。 This is why you get such a large value. 这就是为什么您获得如此高的价值。
myShort = (myChar[0] * 256) + myChar[1];
will give you the expected answer. 将给您预期的答案。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.