简体   繁体   English

有关char签名的问题

[英]Issues about the signedness of char

According to the standard, whether char is signed or not is implementation-defined. 根据标准, char是否签名是实现定义的。 This has caused me some trouble. 这给我带来了一些麻烦。 Following are some examples: 以下是一些例子:

1) Testing the most significant bit. 1)测试最重要的位。 If char is signed, I could simply compare the value against 0 . 如果char已签名,我可以简单地将该值与0进行比较。 If unsigned, I compare the value against 128 instead. 如果未签名,我将该值与128进行比较。 Neither of the two simple methods is generic and applies to both cases. 这两种简单方法都不是通用的,适用于这两种情况。 In order to write portable code, it seems that I have to manipulate the bits directly, which is not neat. 为了编写可移植代码,似乎我必须直接操作这些位,这不是很好。

2) Value assignment. 2)价值分配。 Sometimes, I need to write a bit pattern to a char value. 有时,我需要为char值写一个位模式。 If char is unsigned, this can be done easily using hexadecimal notation, eg, char c = 0xff . 如果char是无符号的,则可以使用十六进制表示法轻松完成,例如, char c = 0xff But this method does not apply when char is signed. 但是当char签名时,此方法不适用。 Take char c = 0xff for example. char c = 0xff为例。 0xff is beyond the the maximum value a signed char can hold. 0xff超出了signed char可以容纳的最大值。 In such cases, the standard says the resulting value of c is implementation-defined. 在这种情况下,标准说c的结果值是实现定义的。

So, does anybody have good ideas about the these two issues? 那么,有没有人对这两个问题有好的想法? With respect to the second one, I'm wondering whether char c = '\\xff' is OK for both signed and unsigned char . 关于第二个,我想知道char c = '\\xff'对于signed和unsigned char是否正常。

NOTE: It is sometimes needed to write explicit bit patterns to characters. 注意:有时需要将明确的位模式写入字符。 See the example in http://en.cppreference.com/w/cpp/string/multibyte/mbsrtowcs . 请参阅http://en.cppreference.com/w/cpp/string/multibyte/mbsrtowcs中的示例。

1) testing MSB: (x | 0x7F) != 0x7F (or reinterpret_cast<unsigned char&>(x) & 0x80 ) 1)测试MSB: (x | 0x7F) != 0x7F (或reinterpret_cast<unsigned char&>(x) & 0x80

2) reinterpret_cast<unsigned char&>(x) = 0xFF; 2) reinterpret_cast<unsigned char&>(x) = 0xFF;

Note that reinterpret_cast is entirely appropriate if you want to treat the memory the character occupies as a collection of bits, bypassing the specific bit patterns associated with any given value in the char type. 请注意,如果要将字符占用的内存视为位集合,则绕过与char类型中任何给定值关联的特定位模式, reinterpret_cast是完全合适的。

If you really care about the signed-ness, just declare the variable as signed char or unsigned char as needed. 如果您真的关心signed-ness,只需根据需要将变量声明为signed charunsigned char No platform-independent bit-twiddling tricks required. 不需要平台无关的比特伎俩。

Actually you can do what you want without worrying about signedness. 实际上你可以做你想做的事而不用担心签名。

Hexadecimal describes bit pattern not the integral value. 十六进制描述位模式而不是整数值。 (see disclaimer) (见免责声明)

So for 2. you said you can't assign bit patterns like this 所以对于2.你说你不能分配像这样的位模式

char c = 0xff char c = 0xff

but you realy can do that, signed or not. 但你真的可以这样做,签名与否。

For 1, you may not be able to do the "compare with 0" trick, but you stil have several ways to check the most significant bit. 对于1,您可能无法执行“与0比较”技巧,但您仍有几种方法可以检查最重要的位。 One way is, shift to the right 7, shifting in zero's on the left, and then check if it's equal to 1. Independent of signedness. 一种方法是,向右移动7,在左边移动零,然后检查它是否等于1.独立于签名。

As Tony D pointed out, (x | 0x7F) != 0x7F is a more portable way of doing it instead of shifting because it may not shift in zeros. 正如Tony D指出的那样,(x | 0x7F)!= 0x7F是一种更便携的方式,而不是移位,因为它可能不会以零移位。 Similarily, you could do x & 0x80 == 0x80. 类似地,你可以做x&0x80 == 0x80。

Of course you can also do what Brian suggested and just use an unsigned char. 当然你也可以做Brian建议的,只使用unsigned char。

Disclaimer: Tony pointed out that 0x is actually an int and the conversion to char is implementation defined when the char can't hold the value or if the char is unsigned. 免责声明:Tony指出0x实际上是一个int,当char不能保存值或者char是无符号时,转换为char是实现定义的。 However, no implementation is going to break the standard here. 但是,没有任何实施会破坏这里的标准。 char c = 0xFF, weather or unsigned or not, will fill the bits, trust me. char c = 0xFF,天气或未签名或不签名,将填补这些位,相信我。 It will be extremely difficult to find an implementation that doesn't do that. 找到一个不这样做的实现将是非常困难的。

您可以分别使用两个0x7F0xFF对给定值进行OR和AND来检测并删除其signed_ness。

Easiest way to test the MSB is to make it the LSB: char c = foo(); if ((c>>(CHAR_BIT-1)) & 1) ... 测试MSB的最简单方法是使其成为LSB: char c = foo(); if ((c>>(CHAR_BIT-1)) & 1) ... char c = foo(); if ((c>>(CHAR_BIT-1)) & 1) ... . char c = foo(); if ((c>>(CHAR_BIT-1)) & 1) ...

Setting a specific bitpattern is a bit more tricky. 设置特定的位模式有点棘手。 All-bits-one for instance may not necessarily be 0xff but could also be 0x7ff, ore more realistically 0xffff. 例如,全比特一可能不一定是0xff,但也可能是0x7ff,更实际的是0xffff。 Regardless, ~char(0) is all-bits-one. 无论如何, ~char(0)是all-bits-one。 Somewhat less obvious, so is char(-1) . 有点不太明显, char(-1)也是如此。 If char is signed, that's clear; 如果签署了char,那就很清楚; if unsigned this is still correct because unsigned type work modulo 2^N. 如果无符号,这仍然是正确的,因为无符号类型工作模2 ^ N. Following that logic, char(-128) sets just the 8 bit regardless of how many bits there are in the char or whether it's signed. 遵循该逻辑, char(-128)只设置8位,无论char中有多少位或是否有符号。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM