简体   繁体   English

编译器在将有符号变量转换为更大的变量类型时会使用什么算法,C 语言?

[英]What is the algorithm that a compiler would use while casting signed variables to larger variable types, C language?

The answer might be compiler dependent but;答案可能取决于编译器,但是;

What is the expected output of the lines below?以下行的预期 output 是什么?

signed char a = -5;
printf("%x \n", (signed short) a); 
printf("%x \n", (unsigned short) a);

Would a compiler fill the Most Significant Bits with zeros (0) or ones (1) while casting signed char to a larger variable?编译器会在将signed char转换为更大的变量时用零 (0) 或一 (1) 填充最高有效位吗? How and when?如何以及何时?


PS There are other issues too. PS 还有其他问题。 I tried to run the code below on an online compiler for testing.我试图在在线编译器上运行下面的代码进行测试。 The outputs were not as I expected.结果并不像我预期的那样。 So I added the verbose castings, but it did not work.所以我添加了详细的转换,但它没有用。 Why is the output of printf("%x \n", (signed char)b);为什么printf("%x \n", (signed char)b);的output是4 bytes long instead of 1? 4 个字节长而不是 1 个字节?

int main()
{
    unsigned char a = (unsigned char)5;
    signed char b = (signed char)-5;
    
    unsigned short c;
    signed short d;
    
    c = (unsigned short)b;
    d = (signed short)b;
    
    printf("%x ||| %x ||| %x ||| %x\n", (unsigned char)a, (signed char)b, c, d);
    printf("%d ||| %d ||| %d ||| %d\n", a, b, c, d);
    printf("%d ||| %d ||| %d ||| %d\n", a, b, (signed char)c, (signed char)d);

    return 0;
}


Output:

5 ||| fffffffb ||| fffb ||| fffffffb
5 ||| -5 ||| 65531 ||| -5   
5 ||| -5 ||| -5 ||| -5

In C, arguments to variadic functions (like printf ) which are of lower rank than int are converted to int .在 C 中,arguments 到等级低于int的可变参数函数(如printf )被转换为int (Not unsigned int unless the argument is unsigned and the same width as int ). (不是unsigned int除非参数是无符号的并且宽度与int相同)。

Converting a signed short or signed char to signed int does not change the value.signed shortsigned char转换为signed int不会更改值。 If you start with -5, you end up with -5.如果您从 -5 开始,您将以 -5 结束。

But if you convert a negative signed value to an unsigned type (using an explicit cast, for example), the conversion is done modulo one more than the maximum value of the unsigned type.但是,如果您将负符号值转换为无符号类型(例如,使用显式强制转换),则转换将以比无符号类型的最大值大一为模的方式完成。 For example, the maximum value of an unsigned short is 65535 (on many implementations), so converting -5 to unsigned short results in -5 modulo 65536, which is 65531. (C's % operator does not produce mathematical modular reduction.) When that value is then implicitly converted to an int , it is still 65531, so that's what's printed with %x ( fffb ).例如, unsigned short的最大值为 65535(在许多实现中),因此将 -5 转换为unsigned short结果为 -5 模 65536,即 65531。(C 的%运算符不产生数学模归约。)当那个然后 value 被隐式转换为int ,它仍然是 65531,所以这就是用%x ( fffb ) 打印的内容。

Note that it is technically incorrect to apply the format %x to a signed int .请注意,将格式%x应用于signed int在技术上是不正确的。 %x requires that the corresponding argument be an unsigned int . %x要求相应的参数是一个unsigned int Currently, C does not guarantee what the result of interpreting a signed value as unsigned will be, but that will soon change.目前,C 不保证将有符号值解释为无符号值的结果是什么,但这很快就会改变。 (It's not a conversion. At runtime, types no longer exist, and values are just bit patterns.) (这不是转换。在运行时,类型不再存在,值只是位模式。)

The exact rules for converting between signed and unsigned types are listed in section 6.3.1.3 of the C11 standard :C11 标准的第 6.3.1.3 节中列出了在有符号和无符号类型之间转换的确切规则:

1 When a value with integer type is converted to another integer type other than _Bool , if the value can be represented by the new type, it is unchanged. 1当 integer 类型的值转换为_Bool以外的另一种 integer 类型时,如果该值可以用新类型表示,则它不变。

2 Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type. 2否则,如果新类型是无符号的,则通过比新类型可以表示的最大值重复加或减一来转换值,直到该值在新类型的范围内。

3 Otherwise, the new type is signed and the value cannot be represented in it; 3否则,新类型已签名,无法在其中表示值; either the result is implementation-defined or an implementation-defined signal is raised.结果是实现定义的,或者引发了实现定义的信号。

As for what the above means for this code:至于上面这段代码的含义:

signed char a = -5;
printf("%x \n", (signed short) a); 
printf("%x \n", (unsigned short) a);

There are a few things going on here.这里发生了一些事情。

For the first printf , you first have a conversion from signed char to signed short .对于第一个printf ,您首先将signed char转换为signed short By clause 1 above, since the value -5 can be stored in both, the value is unchanged by the cast.根据上面的第 1 条,由于值 -5 可以存储在两者中,因此值不会被强制转换更改。 Then, because this value is passed to a variadic function, it is then promoted to type int , and again by clause 1 the value is unchanged.然后,因为这个值被传递给可变参数 function,所以它被提升为int类型,并且再次通过第 1 条,该值保持不变。

Then the resulting int value is printed with the %x format specifier, which is expecting an unsigned int .然后使用%x格式说明符打印生成的int值,该说明符需要一个unsigned int This is technically undefined behavior for a mismatched format specifier, although most implementations will allow for implicit signed / unsigned reinterpretation.对于不匹配的格式说明符,这在技术上是未定义的行为,尽管大多数实现将允许隐式签名/未签名重新解释。 So assuming two's complement representation, the representation of the int value -5 will be printed, and assuming a 32 bit int this will be fffffffb .因此,假设二进制补码表示,将打印int值 -5 的表示,并假设 32 位int这将是fffffffb

For the second printf , the conversion from signed char to unsigned short will happen according to clause 2 above since the value -5 can't be stored in a unsigned short .对于第二个printf ,从signed charunsigned short的转换将根据上面的第 2 条发生,因为值 -5 不能存储在unsigned short中。 Assuming a 16 bit short, this gives you the value 65536 - 5 = 65531. And assuming two complement representation, this is equivalent to sign-extending the representation from fb to fffb .假设 16 位短,这给你值 65536 - 5 = 65531。假设两个补码表示,这相当于将表示从fb符号扩展到fffb This unsigned short value is then promoted to int when it is passed to printf , and by clause 1 the value is unchanged.这个unsigned short值然后在传递给printf时被提升为int ,并且根据第 1 条,该值保持不变。 Then the %x format specifier prints this as fffb .然后%x格式说明符将其打印为fffb

Conversions between integer types are value preserving when the value being converted is representable in the destination type.当被转换的值可以在目标类型中表示时,integer 类型之间的转换是值保留的。 signed short can represent all values representable by signed char , so this... signed short可以表示signed char可表示的所有值,所以这...

signed char a = -5;
printf("%hd\n", (signed short) a);

... would be expected to output a line containing "-5". ...预计 output 包含“-5”的行。

Your code, however, has undefined behavior.但是,您的代码具有未定义的行为。 The conversion specifier %x requires the corresponding argument to have type unsigned int , whereas you are passing a signed short (converted to int according to the default argument promotions).转换说明符%x要求相应的参数具有类型unsigned int ,而您传递的是带signed short (根据默认参数促销转换为int )。

Provided that your implementation uses two's complement representation for signed integers (and I feel safe in asserting that it does), the representation will have sign-extended the original signed char to the width of a signed short , and then sign-extended that to the width of a (signed) int .如果您的实现对有符号整数使用二进制补码表示(我可以肯定地断言它确实如此),则表示会将原始带signed char符号扩展为带signed short的宽度,然后将符号扩展为(signed) int的宽度。 Thus, one reasonably likely manifestation of the UB in your...因此,UB 在您身上的一种合理可能的表现形式……

 printf("%x \n", (signed short) a);

... would be to print ...将是打印

fffffffb fffffffb

The other case is a bit different.另一种情况有点不同。 Integer conversions where the target type is unsigned and cannot represent the source value are well defined. Integer 目标类型为无符号且不能表示源值的转换已明确定义。 The source value is converted to the destination type by reducing it modulo the number of representable values in the target type.通过以目标类型中可表示值的数量为模减少源值,将源值转换为目标类型。 Thus, if your unsigned short has 16 value bits then the result of converting -5 to unsigned short is -5 modulo 65536, which is 65531.因此,如果您的unsigned short有 16 个值位,那么将 -5 转换为unsigned short的结果是 -5 modulo 65536,即 65531。

Thus,因此,

printf("%hu\n", (unsigned short) a);

would be expected to print a line containing "65531".预计会打印包含“65531”的行。

Again, the %x conversion specifier does not match the type of the corresponding argument ( (unsigned short) a , converted to int via the default argument promotions), so your printf has undefined behavior.同样, %x转换说明符与相应参数的类型不匹配( (unsigned short) a ,通过默认参数提升转换为int ),因此您的printf具有未定义的行为。 However, the conversion of a 16-bit unsigned short to a 32-bit int on a two's complement system will invole zero-extending the representation of the source, so one reasonably likely manifestation of the UB in your...但是,在二进制补码系统上将 16 位unsigned short转换为 32 位int将涉及零扩展源的表示形式,因此 UB 在您的...

 printf("%x \n", (unsigned short) a);

... would be to print ...将是打印

fffb fffb

. .

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在 C 编程语言中:将 3(数字 3)分配给变量的顺序是什么? 至于哪个变量会收到 3 第一,第二和第三? - In C programming Language: What order would 3 (Number 3) be assigned to the variables? As in which variable would receive 3 first, second and third? 这些变量是 C 语言中的哪些数据类型,我哪里错了? - What data types in C language are these variables and where I am wrong? 在编译为unsigned时,C编译器是否可以更改位表示? - Can a C compiler change bit representation when casting signed to unsigned? 是否允许 C 编译器用另一个算法替换一个算法? - Would a C compiler be allowed to replace an algorithm with another? 以下是什么意思。 C 语言中带符号类型的二进制表示? - What does the following mean wrt. binary representations of signed types in C language? 在 C 语言中使用“{}”进行铸造有什么好处? - What are the advantages of using “{}” for casting in C Language? C:在无符号变量中执行有符号比较而不进行强制转换 - C: performing signed comparison in unsigned variables without casting C语言中的原子类型是什么? - What are atomic types in the C language? 变量类型本身就是变量的语言? - Language where variable types are themselves variables? c语言中“#@”的用途是什么? - what is the use of “#@” in c language?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM