简体   繁体   English

Printf函数格式化程序

[英]Printf function formatter

Having following simple C++ code: 拥有以下简单的C ++代码:

#include <stdio.h>

int main() {
    char c1 = 130;
    unsigned char c2 = 130;

    printf("1: %+u\n", c1);
    printf("2: %+u\n", c2);
    printf("3: %+d\n", c1);
    printf("4: %+d\n", c2);
    ...
    return 0;
}

the output is like that: 输出是这样的:

1: 4294967170
2: 130
3: -126
4: +130

Can someone please explain me the line 1 and 3 results? 有人可以解释一下第1和第3行的结果吗?

I'm using Linux gcc compiler with all default settings. 我正在使用Linux gcc编译器和所有默认设置。

A char is 8 bits. char是8位。 This means it can represent 2^8=256 unique values. 这意味着它可以表示2 ^ 8 = 256个唯一值。 A uchar represents 0 to 255, and a signed char represents -128 to 127 (could represent absolutely anything, but this is the typical platform implementation). uchar表示0到255,有符号的char表示-128到127(可能代表任何东西,但这是典型的平台实现)。 Thus, assigning 130 to a char is out of range by 2, and the value overflows and wraps the value to -126 when it is interpreted as a signed char . 因此,为char分配130超出范围2,并且该值溢出并在将其解释为signed char时将值包装为-126。 The compiler sees 130 as an integer and makes an implicit conversion from int to char . 编译器将130视为整数,并进行从intchar的隐式转换。 On most platforms an int is 32-bit and the sign bit is the MSB, the value 130 easily fits into the first 8-bits, but then the compiler wants to chop of 24 bits to squeeze it into a char. 在大多数平台上,int是32位,符号位是MSB,值130很容易适合前8位,但编译器想要切掉24位以将其压缩成char。 When this happens, and you've told the compiler you want a signed char, the MSB of the first 8 bits actually represents -128. 当发生这种情况时,你告诉编译器你想要一个有符号的字符,前8位的MSB实际上代表-128。 Uh oh! 哦哦! You have this in memory now 1000 0010 , which when interpreted as a signed char is -128+2. 你现在在内存中有这个1000 0010 ,当被解释为有符号字符时是-128 + 2。 My linter on my platform screams about this . 我在我的平台上的短片尖叫着这个。 .

生气的

I make that important point about interpretation because in memory, both values are identical. 我对解释提出了重要的观点,因为在记忆中,两个值都是相同的。 You can confirm this by casting the value in the printf statements, ie, printf("3: %+d\\n", (unsigned char)c1); 您可以通过在printf语句中转换值来确认这一点,即printf("3: %+d\\n", (unsigned char)c1); , and you'll see 130 again. ,你会再看到130。

The reason you see the large value in your first printf statement is that you are casting a signed char to an unsigned int , where the char has already overflowed. 您在第一个printf语句中看到较大值的原因是您将一个signed char转换为unsigned int ,其中char已经溢出。 The machine interprets the char as -126 first, and then casts to unsigned int , which cannot represent that negative value, so you get the max value of the signed int and subtract 126. 机器首先将char 解释为-126,然后转换为unsigned int ,它不能表示该负值,因此您获得signed int的最大值并减去126。

2^32-126 = 4294967170 . 2 ^ 32-126 = 4294967170。 . bingo 答对了

In printf statement 2, all the machine has to do is add 24 zeros to reach 32-bit, and then interpret the value as int . printf语句2中,所有机器必须做的是添加24个零以达到32位,然后将该值解释为int In statement one, you've told it that you have a signed value, so it first turns that to a 32-bit -126 value, and then interprets that -ve integer as an unsigned integer. 在语句1中,您已经告诉它您有一个带符号的值,因此它首先将其转换为32位-126值,然后将-ve integer解释为无符号整数。 Again, it flips how it interprets the most significant bit. 再一次,它翻转了它如何解释最重要的一点。 There are 2 steps: 有两个步骤:

  1. Signed char is promoted to signed int, because you want to work with ints. 签名的char被提升为signed int,因为你想使用int。 The char (is probably copied and) has 24 bits added. char(可能是复制的)添加了24位。 Because we're looking at a signed value, some machine instruction will happen to perform twos complement, so the memory here looks quite different. 因为我们正在查看有符号值,所以某些机器指令将执行二进制补码,因此这里的内存看起来完全不同。
  2. The new signed int memory is interpreted as unsigned, so the machine looks at the MSB and interprets it as 2^32 instead of -2^31 as happened in the promotion. 新签名的int内存被解释为无符号,因此机器查看MSB并将其解释为2 ^ 32而不是-2 ^ 31,如促销中所发生的那样。

An interesting bit of trivia, is you can suppress the clang-tidy linter warning if you do char c1 = 130u; 一个有趣的琐事,如果你做char c1 = 130u; ,你可以抑制铿锵有力的催眠警告char c1 = 130u; , but you still get the same garbage based on the above logic (ie the implicit conversion throws away the first 24-bits, and the sign-bit was zero anyhow). ,但是你仍然会根据上面的逻辑获得相同的垃圾(即隐式转换会抛弃前24位,而且无论如何,符号位都为零)。 I'm have submitted an LLVM clang-tidy missing functionality report based on exploring this question (issue 42137 if you really wanna follow it) 😉. 我已经提交了一份基于探索这个问题的LLVM铿锵声缺失的功能报告(问题42137,如果你真的想跟着它)😉。

(This answer assumes that, on your machine, char ranges from -128 to 127, that unsigned char ranges from 0 to 255, and that unsigned int ranges from 0 to 4294967295, which happens to be the case.) (这个答案假定,在你的机器上, char范围是-128到127, unsigned char范围是0到255, unsigned int范围是0到4294967295,这恰好就是这种情况。)

char c1 = 130;

Here, 130 is outside the range of numbers representable by char . 这里,130超出了char表示的数字范围。 The value of c1 is implementation-defined. c1的值是实现定义的。 In your case, the number happens to "wrap around," initializing c1 to static_cast<char>(-126) . 在您的情况下,数字恰好“环绕”,将c1初始化为static_cast<char>(-126)

In

printf("1: %+u\n", c1);

c1 is promoted to int , resulting in -126 . c1被提升为int ,结果为-126 Then, it is interpreted by the %u specifier as unsigned int . 然后,它由%u说明符解释为unsigned int This is undefined behavior. 这是未定义的行为。 This time the resulting number happens to be the unique number representable by unsigned int that is congruent to -126 modulo 4294967296, which is 4294967170. 这次得到的数字恰好是unsigned int表示的唯一数字,它与-126 modulo 4294967296一致,即4294967170。

In

printf("3: %+d\n", c1);

The int value -126 is interpreted by the %d specifier as int directly, and outputs -126 as expected (?). int-126%d说明符直接解释为int ,并按预期输出-126(?)。

In cases 1, 2 the format specifier doesn't match the type of the argument, so the behaviour of the program is undefined (on most systems). 在情况1,2中,格式说明符与参数的类型不匹配,因此程序的行为是未定义的(在大多数系统上)。 On most systems char and unsigned char are smaller than int , so they promote to int when passed as variadic arguments. 在大多数系统中, charunsigned char小于int ,因此当它们作为variadic参数传递时它们会提升为int。 int doesn't match the format specifier %u which requires unsigned int . int与需要unsigned int的格式说明符%u不匹配。

On exotic systems (which your target is not) where unsigned char is as large as int , it will be promoted to unsigned int instead, in which case 4 would have UB since it requires an int . 在异类系统(你的目标不是)上, unsigned charint一样大,它将被提升为unsigned int ,在这种情况下,4将具有UB,因为它需要一个int


Explanation for 3 depends a lot on implementation specified details. 3的解释很大程度上取决于实现指定的细节。 The result depends on whether char is signed or not, and it depends on the representable range. 结果取决于char是否已签名,并且取决于可表示的范围。

If 130 was a representable value of char , such as when it is an unsigned type, then 130 would be the correct output. 如果130是char的可表示值,例如当它是无符号类型时,那么130将是正确的输出。 That appears to not be the case, so we can assume that char is a signed type on the target system. 看起来情况并非如此,因此我们可以假设char是目标系统上的签名类型。

Initialising a signed integer with an unrepresentable value (such as char with 130 in this case) results in an implementation defined value. 使用不可表示的值初始化带符号的整数(例如,在这种情况下为130的char )会导致实现定义的值。

On systems with 2's complement representation for signed numbers - which is ubiquitous representation these days - the implementation defined value is typically the representable value that is congruent with the unrepresentable value modulo the number of representable values. 在具有有符号数的2的补码表示的系统上 - 这些天是无处不在的表示 - 实现定义的值通常是与可表示值的模数一致的可表示值的可表示值。 -126 is congruent with 130 modulo 256 and is a representable value of char . -126与130模256一致,是char的可表示值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM