简体   繁体   English

类型转换 - unsigned to signed int / char

[英]Type conversion - unsigned to signed int/char

I tried the to execute the below program: 我试过执行以下程序:

#include <stdio.h>

int main() {
    signed char a = -5;
    unsigned char b = -5;
    int c = -5;
    unsigned int d = -5;

    if (a == b)
        printf("\r\n char is SAME!!!");
    else
        printf("\r\n char is DIFF!!!");

    if (c == d)
        printf("\r\n int is SAME!!!");
    else
        printf("\r\n int is DIFF!!!");

    return 0;
}

For this program, I am getting the output: 对于这个程序,我得到输出:

char is DIFF!!! char是DIFF !!! int is SAME!!! int是相同的!

Why are we getting different outputs for both? 为什么我们两者都有不同的输出?
Should the output be as below ? 输出应该如下?

char is SAME!!! char是相同的! int is SAME!!! int是相同的!

A codepad link . 键盘链接

This is because of the various implicit type conversion rules in C. There are two of them that a C programmer must know: the usual arithmetic conversions and the integer promotions (the latter are part of the former). 这是因为C中的各种隐式类型转换规则.C程序员必须知道其中有两个: 通常的算术转换整数提升 (后者是前者的一部分)。

In the char case you have the types (signed char) == (unsigned char) . 在char情况下,您有类型(signed char) == (unsigned char) These are both small integer types . 这些都是小整数类型 Other such small integer types are bool and short . 其他这样的小整数类型是boolshort The integer promotion rules state that whenever a small integer type is an operand of an operation, its type will get promoted to int , which is signed. 整数提升规则规定 ,只要小整数类型是操作的操作数,其类型就会被提升为已签名的int This will happen no matter if the type was signed or unsigned. 无论类型是签名还是未签名,都会发生这种情况。

In the case of the signed char , the sign will be preserved and it will be promoted to an int containing the value -5. 对于signed char ,将保留该符号,并将其提升为包含值-5的int In the case of the unsigned char , it contains a value which is 251 (0xFB ). 对于unsigned char ,它包含一个251(0xFB)的值。 It will be promoted to an int containing that same value. 它将被提升为包含相同值的int You end up with 你结束了

if( (int)-5 == (int)251 )

In the integer case you have the types (signed int) == (unsigned int) . 在整数的情况下,您有类型(signed int) == (unsigned int) They are not small integer types, so the integer promotions do not apply. 它们不是小整数类型,因此整数促销不适用。 Instead, they are balanced by the usual arithmetic conversions , which state that if two operands have the same "rank" (size) but different signedness, the signed operand is converted to the same type as the unsigned one. 相反,它们通常通过算术转换进行平衡, 该转换声明如果两个操作数具有相同的“等级”(大小)但签名不同,则签名操作数将转换为与无符号操作数相同的类型。 You end up with 你结束了

if( (unsigned int)-5 == (unsigned int)-5)

Cool question! 好问题!

The int comparison works, because both ints contain exactly the same bits, so they are essentially the same. int比较有效,因为两个int都包含完全相同的位,因此它们基本相同。 But what about the char s? 但是char怎么样?

Ah, C implicitly promotes char s to int s on various occasions. 啊,C在各种场合暗中将char s提升为int This is one of them. 这是其中之一。 Your code says if(a==b) , but what the compiler actually turns that to is: 你的代码说if(a==b) ,但编译器实际上将其转换为:

if((int)a==(int)b) 

(int)a is -5, but (int)b is 251. Those are definitely not the same. (int)a是-5,但是(int)b是251.这些肯定是不一样的。

EDIT: As @Carbonic-Acid pointed out, (int)b is 251 only if a char is 8 bits long. 编辑:正如@ Carbonic-Acid指出的那样,只有当char长度为8位时, (int)b为251。 If int is 32 bits long, (int)b is -32764. 如果int是32位长, (int)b是-32764。

REDIT: There's a whole bunch of comments discussing the nature of the answer if a byte is not 8 bits long. REDIT:如果一个字节长度不是8位,那么就会有很多评论讨论答案的本质。 The only difference in this case is that (int)b is not 251 but a different positive number, which isn't -5. 在这种情况下唯一的区别是(int)b不是251而是一个不同的数,它不是-5。 This is not really relevant to the question which is still very cool. 这与仍然非常酷的问题无关。

Welcome to integer promotion . 欢迎整数推广 If I may quote from the website: 如果我可以从网站引用:

If an int can represent all values of the original type, the value is converted to an int; 如果int可以表示原始类型的所有值,则该值将转换为int; otherwise, it is converted to an unsigned int. 否则,它将转换为unsigned int。 These are called the integer promotions. 这些被称为整数促销。 All other types are unchanged by the integer promotions. 整数促销不会更改所有其他类型。

C can be really confusing when you do comparisons such as these, I recently puzzled some of my non-C programming friends with the following tease: 当你进行这样的比较时,C可能会让人感到困惑,我最近对以下挑逗的一些非C编程朋友感到困惑:

#include <stdio.h>
#include <string.h>

int main()
{
    char* string = "One looooooooooong string";

    printf("%d\n", strlen(string));

    if (strlen(string) < -1) printf("This cannot be happening :(");

    return 0;
}

Which indeed does print This cannot be happening :( and seemingly demonstrates that 25 is smaller than -1! 这确实打印This cannot be happening :(并且似乎表明25小于-1!

What happens underneath however is that -1 is represented as an unsigned integer which due to the underlying bits representation is equal to 4294967295 on a 32 bit system. 然而,在下面发生的是-1表示为无符号整数,由于底层位表示在32位系统上等于4294967295。 And naturally 25 is smaller than 4294967295. 自然25小于4294967295。

If we however explicitly cast the size_t type returned by strlen as a signed integer: 但是,如果我们显式地将strlen返回的size_t类型转换为有符号整数:

if ((int)(strlen(string)) < -1)

Then it will compare 25 against -1 and all will be well with the world. 然后它会将25与-1进行比较,所有这些都将与世界相得益彰。

A good compiler should warn you about the comparison between an unsigned and signed integer and yet it is still so easy to miss (especially if you don't enable warnings). 一个好的编译器应该警告你有关无符号整数和有符号整数之间的比较,但它仍然很容易被忽略(特别是如果你不启用警告)。

This is especially confusing for Java programmers as all primitive types there are signed. 这对Java程序员来说尤其令人困惑,因为所有原始类型都有签名。 Here's what James Gosling (one of the creators of Java) had to say on the subject : 以下是James Gosling(Java的创建者之一) 在这个问题上所说的话

Gosling: For me as a language designer, which I don't really count myself as these days, what "simple" really ended up meaning was could I expect J. Random Developer to hold the spec in his head. Gosling:对于我作为一名语言设计师而言,我现在并不像以前那样真实地认为,“简单”真正意义上的结果是我可以期待J. Random Developer在他的脑海中保留这个规范。 That definition says that, for instance, Java isn't -- and in fact a lot of these languages end up with a lot of corner cases, things that nobody really understands. 这个定义说,例如,Java不是 - 实际上很多这些语言都有很多极端情况,这些都是没人真正理解的。 Quiz any C developer about unsigned, and pretty soon you discover that almost no C developers actually understand what goes on with unsigned, what unsigned arithmetic is. 测试任何C开发人员关于unsigned的问题,很快你就会发现几乎没有C开发人员真正理解无符号算法是什么,无符号算术是什么。 Things like that made C complex. 这样的事情让C变得复杂。 The language part of Java is, I think, pretty simple. 我认为Java的语言部分非常简单。 The libraries you have to look up. 你必须查找的库。

The hex representation of -5 is: -5的十六进制表示为:

  • 8-bit, two's complement signed char : 0xfb 8位,2的补码signed char0xfb
  • 32-bit, two's complement signed int : 0xfffffffb 32位,二位补码signed int0xfffffffb

When you convert a signed number to an unsigned number, or vice versa, the compiler does ... precisely nothing. 当您将有符号数转换为无符号数时,反之亦然,编译器确实......没有。 What is there to do? 怎么办? The number is either convertible or it isn't, in which case undefined or implementation-defined behaviour follows (I've not actually checked which) and the most efficient implementation-defined behaviour is to do nothing. 数字是可转换的,或者不是,在这种情况下,未定义或实现定义的行为如下(我实际上没有检查哪个),并且最有效的实现定义的行为是什么都不做。

So, the hex representation of (unsigned <type>)-5 is: 所以, (unsigned <type>)-5的十六进制表示是:

  • 8-bit, unsigned char : 0xfb 8位unsigned char0xfb
  • 32-bit, unsigned int : 0xfffffffb 32位, unsigned int0xfffffffb

Look familiar? 看起来熟悉? They're bit-for-bit the same as the signed versions. 它们与签名版本的位数相同。

When you write if (a == b) , where a and b are of type char , what the compiler is actually required to read is if ((int)a == (int)b) . 当你写if (a == b) ,其中abchar类型时,编译器实际需要读取的是if ((int)a == (int)b) (This is that "integer promotion" that everyone else is banging on about.) (这是其他人正在抨击的“整数推广”。)

So, what happens when we convert char to int ? 那么,当我们将char转换为int时会发生什么?

  • 8-bit signed char to 32-bit signed int : 0xfb -> 0xfffffffb 8位带signed char到32位有signed int0xfb - > 0xfffffffb
    • Well, that makes sense because it matches the representations of -5 above! 嗯,这是有道理的,因为它匹配-5以上的表示!
    • It's called a "sign-extend", because it copies the top bit of the byte, the "sign-bit", leftwards into the new, wider value. 它被称为“符号扩展”,因为它将字节的最高位“符号位”向左复制到新的更宽的值中。
  • 8-bit unsigned char to 32-bit signed int : 0xfb -> 0x000000fb 8位unsigned char到32位有signed int0xfb - > 0x000000fb
    • This time it does a "zero-extend" because the source type is unsigned , so there is no sign-bit to copy. 这次它执行“零扩展”,因为源类型是无符号的 ,因此没有要复制的符号位。

So, a == b really does 0xfffffffb == 0x000000fb => no match! 所以, a == b确实是0xfffffffb == 0x000000fb =>不匹配!

And, c == d really does 0xfffffffb == 0xfffffffb => match! 并且, c == d确实是0xfffffffb == 0xfffffffb =>匹配!

My point is: didn't you get a warning at compile time "comparing signed and unsigned expression"? 我的观点是:你在编译时没有收到“比较有符号和无符号表达式”的警告吗?

The compiler is trying to inform you that he is entitled to do crazy stuff! 编译器试图告诉你他有权做疯狂的事情! :) I would add, crazy stuff will happen using big values, close to the capacity of the primitive type. :)我想补充一点,疯狂的东西会发生使用大值,接近原始类型的容量。 And

 unsigned int d = -5;

is assigning definitely a big value to d, it's equivalent (even if, probably not guaranteed to be equivalent) to be: 肯定是给d一个很大的值,它是等价的(即使,可能不保证是等价的):

 unsigned int d = UINT_MAX -4; ///Since -1 is UINT_MAX

Edit: 编辑:

However, it is interesting to notice that only the second comparison gives a warning (check the code) . 但是,有趣的是注意到只有第二个比较会发出警告(检查代码) So it means that the compiler applying the conversion rules is confident that there won't be errors in the comparison between unsigned char and char (during comparison they will be converted to a type that can safely represent all its possible values). 因此,这意味着应用转换规则的编译器确信在unsigned charchar之间的比较中不存在错误(在比较期间,它们将被转换为可以安全地表示其所有可能值的类型)。 And he is right on this point. 他在这一点上是对的。 Then, it informs you that this won't be the case for unsigned int and int : during the comparison one of the 2 will be converted to a type that cannot fully represent it. 然后,它通知您,对于unsigned intint ,情况并非如此:在比较期间,其中一个将转换为无法完全表示它的类型。

For completeness, I checked it also for short : the compiler behaves in the same way than for chars, and, as expected, there are no errors at runtime. 为了完整起见, 我还简要地检查了它 :编译器的行为方式与字符相同,并且正如预期的那样,运行时没有错误。

.

Related to this topic, I recently asked this question (yet, C++ oriented). 与此主题相关,我最近提出了这个问题 (但是,面向C ++)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM