简体   繁体   English

什么是无符号字符?

[英]What is an unsigned char?

In C/C++, what an unsigned char is used for?在 C/C++ 中, unsigned char有什么用? How is it different from a regular char ?它与普通的char有何不同?

In C++, there are three distinct character types:在 C++ 中,存在三种不同的字符类型:

  • char
  • signed char
  • unsigned char

If you are using character types for text , use the unqualified char :如果您对text使用字符类型,请使用不合格的char

  • it is the type of character literals like 'a' or '0' .它是字符文字的类型,如'a''0'
  • it is the type that makes up C strings like "abcde"它是构成 C 字符串的类型,如"abcde"

It also works out as a number value, but it is unspecified whether that value is treated as signed or unsigned.它也可以作为数字值计算,但未指定该值被视为有符号还是无符号。 Beware character comparisons through inequalities - although if you limit yourself to ASCII (0-127) you're just about safe.当心通过不等式进行字符比较 - 尽管如果您将自己限制为 ASCII (0-127),那么您就很安全了。

If you are using character types as numbers , use:如果您使用字符类型作为数字,请使用:

  • signed char , which gives you at least the -127 to 127 range. signed char ,它至少给你 -127 到 127 的范围。 (-128 to 127 is common) (-128 到 127 是常见的)
  • unsigned char , which gives you at least the 0 to 255 range. unsigned char ,它至少为您提供0 到 255 的范围。

"At least", because the C++ standard only gives the minimum range of values that each numeric type is required to cover. “至少”,因为 C++ 标准只给出了每个数字类型需要覆盖的最小范围的值。 sizeof (char) is required to be 1 (ie one byte), but a byte could in theory be for example 32 bits. sizeof (char)需要为 1(即一个字节),但理论上一个字节可以是例如 32 位。 sizeof would still be report its size as 1 - meaning that you could have sizeof (char) == sizeof (long) == 1 . sizeof仍然会报告它的大小为1 - 这意味着你可以sizeof (char) == sizeof (long) == 1

This is implementation dependent, as the C standard does NOT define the signed-ness of char .这是依赖于实现的,因为 C 标准没有定义char的符号。 Depending on the platform, char may be signed or unsigned , so you need to explicitly ask for signed char or unsigned char if your implementation depends on it.根据平台的不同, char 可能是signedunsigned ,因此如果您的实现依赖于它,您需要明确要求signed charunsigned char Just use char if you intend to represent characters from strings, as this will match what your platform puts in the string.如果您打算表示字符串中的字符,只需使用char ,因为这将匹配您的平台放入字符串中的内容。

The difference between signed char and unsigned char is as you'd expect.signed charunsigned char之间的区别正如您所期望的。 On most platforms, signed char will be an 8-bit two's complement number ranging from -128 to 127 , and unsigned char will be an 8-bit unsigned integer ( 0 to 255 ).在大多数平台上, signed char将是一个 8 位二进制补码,范围从-128127 ,而unsigned char将是一个 8 位无符号整数( 0255 )。 Note the standard does NOT require that char types have 8 bits, only that sizeof(char) return 1 .注意标准不要求char类型有 8 位,只有sizeof(char)返回1 You can get at the number of bits in a char with CHAR_BIT in limits.h .您可以在limits.h使用CHAR_BIT获取字符中的位数。 There are few if any platforms today where this will be something other than 8 , though.不过,今天几乎没有平台会是8以外的平台。

There is a nice summary of this issue here .有这个问题的一个很好的总结在这里

As others have mentioned since I posted this, you're better off using int8_t and uint8_t if you really want to represent small integers.正如其他人在我发布这篇文章后提到的int8_t ,如果您真的想表示小整数,最好使用int8_tuint8_t

Because i feel it's really called for, i just want to state some rules of C and C++ (they are the same in this regard).因为我觉得真的很需要,所以我只想说明一些C和C++的规则(在这方面它们是相同的)。 First, all bits of unsigned char participate in determining the value if any unsigned char object.首先,所有unsigned char都参与确定任何 unsigned char 对象的值。 Second, unsigned char is explicitly stated unsigned.其次, unsigned char被明确声明为无符号。

Now, i had a discussion with someone about what happens when you convert the value -1 of type int to unsigned char .现在,我与某人讨论了将 int 类型的值-1转换为unsigned char时会发生什么。 He refused the idea that the resulting unsigned char has all its bits set to 1, because he was worried about sign representation.他拒绝了结果unsigned char所有位都设置为 1 的想法,因为他担心符号表示。 But he don't have to.但他没有必要。 It's immediately following out of this rule that the conversion does what is intended:紧随此规则之后,转换会执行预期的操作:

If the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.如果新类型是无符号的,则通过重复加或减一个新类型可以表示的最大值来转换该值,直到该值在新类型的范围内。 ( 6.3.1.3p2 in a C99 draft) (C99 草案中的6.3.1.3p2

That's a mathematical description.这是一个数学描述。 C++ describes it in terms of modulo calculus, which yields to the same rule. C++ 用模演算来描述它,它产生相同的规则。 Anyway, what is not guaranteed is that all bits in the integer -1 are one before the conversion.无论如何,不能保证整数-1中的所有位在转换前都是 1。 So, what do we have so we can claim that the resulting unsigned char has all its CHAR_BIT bits turned to 1?那么,我们有什么可以声明生成的unsigned char所有CHAR_BIT位都变为 1?

  1. All bits participate in determining its value - that is, no padding bits occur in the object.所有位都参与确定其值——也就是说,对象中不出现填充位。
  2. Adding only one time UCHAR_MAX+1 to -1 will yield a value in range, namely UCHAR_MAX仅将一次UCHAR_MAX+1添加到-1将产生一个范围内的值,即UCHAR_MAX

That's enough, actually!其实够了! So whenever you want to have an unsigned char having all its bits one, you do所以每当你想要一个unsigned char所有位都为 1 时,你可以

unsigned char c = (unsigned char)-1;

It also follows that a conversion is not just truncating higher order bits.它也遵循转换只是截断高阶位。 The fortunate event for two's complement is that it is just a truncation there, but the same isn't necessarily true for other sign representations.二进制补码的幸运事件是它只是在那里截断,但对于其他符号表示不一定如此。

As for example usages of unsigned char :例如unsigned char 的用法:

unsigned char is often used in computer graphics, which very often (though not always) assigns a single byte to each colour component. unsigned char通常用于计算机图形中,它经常(尽管并非总是)为每个颜色分量分配一个字节。 It is common to see an RGB (or RGBA) colour represented as 24 (or 32) bits, each an unsigned char .通常看到 RGB(或 RGBA)颜色表示为 24(或 32)位,每个位都是一个unsigned char Since unsigned char values fall in the range [0,255], the values are typically interpreted as:由于unsigned char值落在 [0,255] 范围内,因此这些值通常被解释为:

  • 0 meaning a total lack of a given colour component. 0 表示完全没有给定的颜色成分。
  • 255 meaning 100% of a given colour pigment. 255 表示 100% 的给定颜色颜料。

So you would end up with RGB red as (255,0,0) -> (100% red, 0% green, 0% blue).所以你最终会得到 RGB 红色为 (255,0,0) -> (100% 红色,0% 绿色,0% 蓝色)。

Why not use a signed char ?为什么不使用signed char Arithmetic and bit shifting becomes problematic.算术和位移位变得有问题。 As explained already, a signed char 's range is essentially shifted by -128.如前所述,有signed char的范围基本上移动了 -128。 A very simple and naive (mostly unused) method for converting RGB to grayscale is to average all three colour components, but this runs into problems when the values of the colour components are negative.将 RGB 转换为灰度的一种非常简单和幼稚(大部分未使用)的方法是对所有三个颜色分量求平均值,但是当颜色分量的值为负时,这会遇到问题。 Red (255, 0, 0) averages to (85, 85, 85) when using unsigned char arithmetic.使用unsigned char算术时,红色 (255, 0, 0) 平均为 (85, 85, 85)。 However, if the values were signed char s (127,-128,-128), we would end up with (-99, -99, -99), which would be (29, 29, 29) in our unsigned char space, which is incorrect.但是,如果这些值是有signed char (127,-128,-128),我们最终会得到 (-99, -99, -99),在我们的unsigned char空间中将是 (29, 29, 29) ,这是不正确的。

如果要将字符用作小整数,最安全的方法是使用int8_tuint8_t类型。

unsigned char takes only positive values....like 0 to 255 unsigned char只取正值......比如0255

where as然而

signed char takes both positive and negative values....like -128 to +127 signed char采用正值和负值......比如-128+127

signed char has range -128 to 127; signed char范围是 -128 到 127; unsigned char has range 0 to 255. unsigned char范围是 0 到 255。

char will be equivalent to either signed char or unsigned char, depending on the compiler, but is a distinct type. char将等价于有符号字符或无符号字符,具体取决于编译器,但它是一种不同的类型。

If you're using C-style strings, just use char .如果您使用 C 风格的字符串,只需使用char If you need to use chars for arithmetic (pretty rare), specify signed or unsigned explicitly for portability.如果您需要将字符用于算术(很少见),请明确指定有符号或无符号以实现可移植性。

char and unsigned char aren't guaranteed to be 8-bit types on all platforms—they are guaranteed to be 8-bit or larger. charunsigned char不能保证在所有平台上都是 8 位类型——它们保证是 8 位或更大。 Some platforms have 9-bit, 32-bit, or 64-bit bytes .某些平台具有9 位、32 位或 64 位字节 However, the most common platforms today (Windows, Mac, Linux x86, etc.) have 8-bit bytes.但是,当今最常见的平台(Windows、Mac、Linux x86 等)具有 8 位字节。

An unsigned char is an unsigned byte value (0 to 255). unsigned char是无符号字节值(0 到 255)。 You may be thinking of char in terms of being a "character" but it is really a numerical value.您可能认为char是一个“字符”,但它实际上是一个数值。 The regular char is signed, so you have 128 values, and these values map to characters using ASCII encoding.常规char是有符号的,因此您有 128 个值,这些值使用 ASCII 编码映射到字符。 But in either case, what you are storing in memory is a byte value.但无论哪种情况,您在内存中存储的都是一个字节值。

In terms of direct values a regular char is used when the values are known to be between CHAR_MIN and CHAR_MAX while an unsigned char provides double the range on the positive end.就直接值而言,当已知值在CHAR_MINCHAR_MAX之间时使用常规字符, CHAR_MAX符号字符在正端提供两倍的范围。 For example, if CHAR_BIT is 8, the range of regular char is only guaranteed to be [0, 127] (because it can be signed or unsigned) while unsigned char will be [0, 255] and signed char will be [-127, 127].例如,如果CHAR_BIT为 8,则常规char的范围只能保证为 [0, 127](因为它可以有符号或无符号),而unsigned char将是 [0, 255],而有signed char将是 [-127] , 127]。

In terms of what it's used for, the standards allow objects of POD (plain old data) to be directly converted to an array of unsigned char.就其用途而言,标准允许 POD(纯旧数据)的对象直接转换为无符号字符数组。 This allows you to examine the representation and bit patterns of the object.这允许您检查对象的表示和位模式。 The same guarantee of safe type punning doesn't exist for char or signed char. char 或signed char 不存在相同的安全类型双关保证。

如果您喜欢使用各种类型的特定长度和符号,那么使用uint8_tint8_tuint16_t等可能会更好,因为它们完全按照他们所说的去做。

unsigned char is the heart of all bit trickery. unsigned char是所有小技巧的核心。 In almost ALL compiler for ALL platform an unsigned char is simply a byte and an unsigned integer of (usually) 8 bits that can be treated as a small integer or a pack of bits.在几乎所有平台的所有编译器中, unsigned char只是一个字节和一个(通常)8 位的无符号整数,可以被视为一个小整数或一组位。

In addiction, as someone else has said, the standard doesn't define the sign of a char.成瘾,正如其他人所说,标准没有定义字符的符号。 so you have 3 distinct char types: char , signed char , unsigned char .所以你有 3 种不同的char类型: charsigned charunsigned char

unsigned char只取正值:0 到 255,而signed char取正值和负值:-128 到 +127。

Some googling found this , where people had a discussion about this.一些谷歌搜索发现了这个,人们对此进行了讨论。

An unsigned char is basically a single byte.无符号字符基本上是一个字节。 So, you would use this if you need one byte of data (for example, maybe you want to use it to set flags on and off to be passed to a function, as is often done in the Windows API).因此,如果您需要一个字节的数据,您将使用它(例如,您可能想使用它来设置要传递给函数的标志打开和关闭,就像在 Windows API 中经常做的那样)。

An unsigned char uses the bit that is reserved for the sign of a regular char as another number.无符号字符使用为常规字符的符号保留的位作为另一个数字。 This changes the range to [0 - 255] as opposed to [-128 - 127].这会将范围更改为 [0 - 255],而不是 [-128 - 127]。

Generally unsigned chars are used when you don't want a sign.当您不需要符号时,通常使用无符号字符。 This will make a difference when doing things like shifting bits (shift extends the sign) and other things when dealing with a char as a byte rather than using it as a number.在将 char 作为字节处理而不是将其用作数字时,这将在执行诸如移位(移位扩展符号)之类的操作和其他操作时有所不同。

quoted frome "the c programming laugage" book:引用自《c 编程语言》一书:

The qualifier signed or unsigned may be applied to char or any integer.限定符有signedunsigned可应用于 char 或任何整数。 unsigned numbers are always positive or zero, and obey the laws of arithmetic modulo 2^n, where n is the number of bits in the type.无符号数总是正数或零,并遵守算术模 2^n 的法则,其中 n 是类型中的位数。 So, for instance, if chars are 8 bits, unsigned char variables have values between 0 and 255, while signed chars have values between -128 and 127 (in a two' s complement machine.) Whether plain chars are signed or unsigned is machine-dependent, but printable characters are always positive.因此,例如,如果字符是 8 位,则无符号字符变量的值介于 0 和 255 之间,而有符号字符的值介于 -128 和 127 之间(在二进制补码机中)。纯字符是有符号还是无符号是机器依赖,但可打印的字符总是正数。

signed char and unsigned char both represent 1byte, but they have different ranges. signed charunsigned char都代表1个字节,但它们有不同的范围。

   Type        |      range
-------------------------------
signed char    |  -128 to +127
unsigned char  |     0 to 255

In signed char if we consider char letter = 'A' , 'A' is represent binary of 65 in ASCII/Unicode , If 65 can be stored, -65 also can be stored.signed char如果考虑char letter = 'A' ,'A'在ASCII/Unicode代表65的二进制,如果可以存储65,也可以存储-65。 There are no negative binary values in ASCII/Unicode there for no need to worry about negative values. ASCII/Unicode中没有负二进制值,无需担心负值。

Example例子

#include <stdio.h>

int main()
{
    signed char char1 = 255;
    signed char char2 = -128;
    unsigned char char3 = 255;
    unsigned char char4 = -128;

    printf("Signed char(255) : %d\n",char1);
    printf("Unsigned char(255) : %d\n",char3);

    printf("\nSigned char(-128) : %d\n",char2);
    printf("Unsigned char(-128) : %d\n",char4);

    return 0;
}

Output -:输出 -:

Signed char(255) : -1
Unsigned char(255) : 255

Signed char(-128) : -128
Unsigned char(-128) : 128

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM