简体   繁体   中英

Issues about the signedness of char

According to the standard, whether char is signed or not is implementation-defined. This has caused me some trouble. Following are some examples:

1) Testing the most significant bit. If char is signed, I could simply compare the value against 0 . If unsigned, I compare the value against 128 instead. Neither of the two simple methods is generic and applies to both cases. In order to write portable code, it seems that I have to manipulate the bits directly, which is not neat.

2) Value assignment. Sometimes, I need to write a bit pattern to a char value. If char is unsigned, this can be done easily using hexadecimal notation, eg, char c = 0xff . But this method does not apply when char is signed. Take char c = 0xff for example. 0xff is beyond the the maximum value a signed char can hold. In such cases, the standard says the resulting value of c is implementation-defined.

So, does anybody have good ideas about the these two issues? With respect to the second one, I'm wondering whether char c = '\\xff' is OK for both signed and unsigned char .

NOTE: It is sometimes needed to write explicit bit patterns to characters. See the example in http://en.cppreference.com/w/cpp/string/multibyte/mbsrtowcs .

1) testing MSB: (x | 0x7F) != 0x7F (or reinterpret_cast<unsigned char&>(x) & 0x80 )

2) reinterpret_cast<unsigned char&>(x) = 0xFF;

Note that reinterpret_cast is entirely appropriate if you want to treat the memory the character occupies as a collection of bits, bypassing the specific bit patterns associated with any given value in the char type.

If you really care about the signed-ness, just declare the variable as signed char or unsigned char as needed. No platform-independent bit-twiddling tricks required.

Actually you can do what you want without worrying about signedness.

Hexadecimal describes bit pattern not the integral value. (see disclaimer)

So for 2. you said you can't assign bit patterns like this

char c = 0xff

but you realy can do that, signed or not.

For 1, you may not be able to do the "compare with 0" trick, but you stil have several ways to check the most significant bit. One way is, shift to the right 7, shifting in zero's on the left, and then check if it's equal to 1. Independent of signedness.

As Tony D pointed out, (x | 0x7F) != 0x7F is a more portable way of doing it instead of shifting because it may not shift in zeros. Similarily, you could do x & 0x80 == 0x80.

Of course you can also do what Brian suggested and just use an unsigned char.

Disclaimer: Tony pointed out that 0x is actually an int and the conversion to char is implementation defined when the char can't hold the value or if the char is unsigned. However, no implementation is going to break the standard here. char c = 0xFF, weather or unsigned or not, will fill the bits, trust me. It will be extremely difficult to find an implementation that doesn't do that.

您可以分别使用两个0x7F0xFF对给定值进行OR和AND来检测并删除其signed_ness。

Easiest way to test the MSB is to make it the LSB: char c = foo(); if ((c>>(CHAR_BIT-1)) & 1) ... char c = foo(); if ((c>>(CHAR_BIT-1)) & 1) ... .

Setting a specific bitpattern is a bit more tricky. All-bits-one for instance may not necessarily be 0xff but could also be 0x7ff, ore more realistically 0xffff. Regardless, ~char(0) is all-bits-one. Somewhat less obvious, so is char(-1) . If char is signed, that's clear; if unsigned this is still correct because unsigned type work modulo 2^N. Following that logic, char(-128) sets just the 8 bit regardless of how many bits there are in the char or whether it's signed.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM