[英]Printf function formatter
Having following simple C++ code: 拥有以下简单的C ++代码:
#include <stdio.h>
int main() {
char c1 = 130;
unsigned char c2 = 130;
printf("1: %+u\n", c1);
printf("2: %+u\n", c2);
printf("3: %+d\n", c1);
printf("4: %+d\n", c2);
...
return 0;
}
the output is like that: 输出是这样的:
1: 4294967170
2: 130
3: -126
4: +130
Can someone please explain me the line 1 and 3 results? 有人可以解释一下第1和第3行的结果吗?
I'm using Linux gcc
compiler with all default settings. 我正在使用Linux
gcc
编译器和所有默认设置。
A char
is 8 bits. char
是8位。 This means it can represent 2^8=256 unique values. 这意味着它可以表示2 ^ 8 = 256个唯一值。 A
uchar
represents 0 to 255, and a signed char
represents -128 to 127 (could represent absolutely anything, but this is the typical platform implementation). uchar
表示0到255,有符号的char
表示-128到127(可能代表任何东西,但这是典型的平台实现)。 Thus, assigning 130 to a char
is out of range by 2, and the value overflows and wraps the value to -126 when it is interpreted as a signed char
. 因此,为
char
分配130超出范围2,并且该值溢出并在将其解释为signed char
时将值包装为-126。 The compiler sees 130 as an integer and makes an implicit conversion from int
to char
. 编译器将130视为整数,并进行从
int
到char
的隐式转换。 On most platforms an int is 32-bit and the sign bit is the MSB, the value 130 easily fits into the first 8-bits, but then the compiler wants to chop of 24 bits to squeeze it into a char. 在大多数平台上,int是32位,符号位是MSB,值130很容易适合前8位,但编译器想要切掉24位以将其压缩成char。 When this happens, and you've told the compiler you want a signed char, the MSB of the first 8 bits actually represents -128.
当发生这种情况时,你告诉编译器你想要一个有符号的字符,前8位的MSB实际上代表-128。 Uh oh!
哦哦! You have this in memory now
1000 0010
, which when interpreted as a signed char is -128+2. 你现在在内存中有这个
1000 0010
,当被解释为有符号字符时是-128 + 2。 My linter on my platform screams about this . 我在我的平台上的短片尖叫着这个。 .
。
I make that important point about interpretation because in memory, both values are identical. 我对解释提出了重要的观点,因为在记忆中,两个值都是相同的。 You can confirm this by casting the value in the
printf
statements, ie, printf("3: %+d\\n", (unsigned char)c1);
您可以通过在
printf
语句中转换值来确认这一点,即printf("3: %+d\\n", (unsigned char)c1);
, and you'll see 130 again. ,你会再看到130。
The reason you see the large value in your first printf
statement is that you are casting a signed char
to an unsigned int
, where the char
has already overflowed. 您在第一个
printf
语句中看到较大值的原因是您将一个signed char
转换为unsigned int
,其中char
已经溢出。 The machine interprets the char
as -126 first, and then casts to unsigned int
, which cannot represent that negative value, so you get the max value of the signed int
and subtract 126. 机器首先将
char
解释为-126,然后转换为unsigned int
,它不能表示该负值,因此您获得signed int
的最大值并减去126。
2^32-126 = 4294967170 . 2 ^ 32-126 = 4294967170。 .
。 bingo
答对了
In printf
statement 2, all the machine has to do is add 24 zeros to reach 32-bit, and then interpret the value as int
. 在
printf
语句2中,所有机器必须做的是添加24个零以达到32位,然后将该值解释为int
。 In statement one, you've told it that you have a signed value, so it first turns that to a 32-bit -126 value, and then interprets that -ve integer as an unsigned integer. 在语句1中,您已经告诉它您有一个带符号的值,因此它首先将其转换为32位-126值,然后将-ve integer解释为无符号整数。 Again, it flips how it interprets the most significant bit.
再一次,它翻转了它如何解释最重要的一点。 There are 2 steps:
有两个步骤:
An interesting bit of trivia, is you can suppress the clang-tidy linter warning if you do char c1 = 130u;
一个有趣的琐事,如果你做
char c1 = 130u;
,你可以抑制铿锵有力的催眠警告char c1 = 130u;
, but you still get the same garbage based on the above logic (ie the implicit conversion throws away the first 24-bits, and the sign-bit was zero anyhow). ,但是你仍然会根据上面的逻辑获得相同的垃圾(即隐式转换会抛弃前24位,而且无论如何,符号位都为零)。 I'm have submitted an LLVM clang-tidy missing functionality report based on exploring this question (issue 42137 if you really wanna follow it) 😉.
我已经提交了一份基于探索这个问题的LLVM铿锵声缺失的功能报告(问题42137,如果你真的想跟着它)😉。
(This answer assumes that, on your machine, char
ranges from -128 to 127, that unsigned char
ranges from 0 to 255, and that unsigned int
ranges from 0 to 4294967295, which happens to be the case.) (这个答案假定,在你的机器上,
char
范围是-128到127, unsigned char
范围是0到255, unsigned int
范围是0到4294967295,这恰好就是这种情况。)
char c1 = 130;
Here, 130 is outside the range of numbers representable by char
. 这里,130超出了
char
表示的数字范围。 The value of c1
is implementation-defined. c1
的值是实现定义的。 In your case, the number happens to "wrap around," initializing c1
to static_cast<char>(-126)
. 在您的情况下,数字恰好“环绕”,将
c1
初始化为static_cast<char>(-126)
。
In 在
printf("1: %+u\n", c1);
c1
is promoted to int
, resulting in -126
. c1
被提升为int
,结果为-126
。 Then, it is interpreted by the %u
specifier as unsigned int
. 然后,它由
%u
说明符解释为unsigned int
。 This is undefined behavior. 这是未定义的行为。 This time the resulting number happens to be the unique number representable by
unsigned int
that is congruent to -126 modulo 4294967296, which is 4294967170. 这次得到的数字恰好是
unsigned int
表示的唯一数字,它与-126 modulo 4294967296一致,即4294967170。
In 在
printf("3: %+d\n", c1);
The int
value -126
is interpreted by the %d
specifier as int
directly, and outputs -126 as expected (?). int
值-126
由%d
说明符直接解释为int
,并按预期输出-126(?)。
In cases 1, 2 the format specifier doesn't match the type of the argument, so the behaviour of the program is undefined (on most systems). 在情况1,2中,格式说明符与参数的类型不匹配,因此程序的行为是未定义的(在大多数系统上)。 On most systems
char
and unsigned char
are smaller than int
, so they promote to int when passed as variadic arguments. 在大多数系统中,
char
和unsigned char
小于int
,因此当它们作为variadic参数传递时它们会提升为int。 int
doesn't match the format specifier %u
which requires unsigned int
. int
与需要unsigned int
的格式说明符%u
不匹配。
On exotic systems (which your target is not) where unsigned char
is as large as int
, it will be promoted to unsigned int
instead, in which case 4 would have UB since it requires an int
. 在异类系统(你的目标不是)上,
unsigned char
与int
一样大,它将被提升为unsigned int
,在这种情况下,4将具有UB,因为它需要一个int
。
Explanation for 3 depends a lot on implementation specified details. 3的解释很大程度上取决于实现指定的细节。 The result depends on whether
char
is signed or not, and it depends on the representable range. 结果取决于
char
是否已签名,并且取决于可表示的范围。
If 130 was a representable value of char
, such as when it is an unsigned type, then 130 would be the correct output. 如果130是
char
的可表示值,例如当它是无符号类型时,那么130将是正确的输出。 That appears to not be the case, so we can assume that char
is a signed type on the target system. 看起来情况并非如此,因此我们可以假设
char
是目标系统上的签名类型。
Initialising a signed integer with an unrepresentable value (such as char
with 130 in this case) results in an implementation defined value. 使用不可表示的值初始化带符号的整数(例如,在这种情况下为130的
char
)会导致实现定义的值。
On systems with 2's complement representation for signed numbers - which is ubiquitous representation these days - the implementation defined value is typically the representable value that is congruent with the unrepresentable value modulo the number of representable values. 在具有有符号数的2的补码表示的系统上 - 这些天是无处不在的表示 - 实现定义的值通常是与可表示值的模数一致的可表示值的可表示值。 -126 is congruent with 130 modulo 256 and is a representable value of
char
. -126与130模256一致,是
char
的可表示值。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.