简体   繁体   中英

Why can I printf with the wrong specifier and still get output?

My question involves the memory layout and mechanics behind the C printf() function. Say I have the following code:

#include <stdio.h>

int main()
{
    short m_short;
    int m_int;

    m_int = -5339876;

    m_short = m_int;
    printf("%x\n", m_int);
    printf("%x\n", m_short);

    return 0;
}

On GCC 7.5.0 this program outputs:

ffae851c
ffff851c

My question is, where is the ffff actually coming from in the second hex number? If I'm correct, those fs should be outside the bounds of the short, but printf is getting them from somewhere.

When I properly format with specifier %hx , the output is rightly:

ffae851c
851c

As far as I have studied, the compiler simply truncates the top half of the number, as shown in the second output. So in the first output, are the first four f s from the program actually reading into memory that it shouldn't? Or does the C compiler behind-the-scenes still reserve a full integer even for a short, sign-extended, but the high half shall be undefined behavior, if used?

Note: I am performing research, in a real-world application, I would never try to abuse the language.

When a char or short (including signed and unsigned versions) is used as a function argument where there is no specific type (as with the ... arguments to printf(format,...) ) 1 , it is automatically promoted to an int (assuming it is not already as wide as an int 2 ).

So printf("%x\\n", m_short); has an int argument. What is the value of that argument? In the assignment m_short = m_int; , you attempted to assign it the value −5339876 (represented with bytes 0xffae851c). However, −5339876 will not fit in this 16-bit short. In assignments, a conversion is automatically performed, and, when a conversion of an integer to a signed integer type does not fit, the result is implementation-defined. It appears your implementation, as many do, uses two's complement and simply takes the low bits of the integer. Thus, it puts the bytes 0x851c in m_short , representing the value −31460.

Recall that this is being promoted back to int for use as the argument to printf . In this case, it fits in an int , so the result is still −31460. In a two's complement int , that is represented with the bytes 0xffff851c.

Now we know what is being passed to printf : An int with bytes 0xffff851c representing the value −31460. However, you are printing it with %x , which is supposed to receive an unsigned int . With this mismatch, the behavior is not defined by the C standard. However, it is a relatively minor mismatch, and many C implementations let it slide. (GCC and Clang do not warn even with -Wall .)

Let's suppose your C implementation does not treat printf as a special known function and simply generates code for the call as you have written it, and that you later link this program with a C library. In this case, the compiler must pass the argument according to the specification of the Application Binary Interface (ABI) for your platform. (The ABI specifies, among other things, how arguments are passed to functions.) To conform to the ABI, the C compiler will put the address of the format string in one place and the bits of the int in another, and then it will call printf .

The printf routine will read the format string, see %x , and look for the corresponding argument, which should be an unsigned int . In every C implementation and ABI I know of, an int and an unsigned int are passed in the same place . It may be a processor register or a place on the stack. Let's say it is in register r13. So the compiler designed your calling routine to put the int with bytes 0xffff851c in r13, and the printf routine looked for an unsigned int in r13 and found bytes 0xffff851c.

So the result is that printf interprets the bytes 0xffff851c as if they were an unsigned int , formats them with %x , and prints “ffff851c”.

Essentially, you got away with this because (a) a short is promoted to an int , which is the same size as the unsigned int that printf was expecting, and (b) most C implementations are not strict about mismatching integer types of the same width with printf . If you had instead tried printing an int using %ld , you might have gotten different results, such as “garbage” bits in the high bits of the printed value. Or you might have a case where the argument you passed is supposed to be in a completely different place from the argument printf expected, so none of the bits are correct. In some architectures, passing arguments incorrectly could corrupt the stack and break the program in a variety of ways.

Footnotes

1 This automatic promotion happens in many other expressions too.

2 There are some technical details regarding these automatic integer promotions that need not concern us at the moment.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM