简体   繁体   中英

Use of format specifiers for conversions

I am unable to deduce the internal happenings inside the machine when we print data using format specifiers.

I was trying to understand the concept of signed and unsigned integers and the found the following:

unsigned int b=-12;  
printf("%d\n",b);     //prints -12
printf("%u\n\n",b);   //prints 4294967284

I am guessing that b actually stores the binary version of -12 as 11111111111111111111111111110100.

So, since b is unsigned , b technically stores 4294967284. But still the format specifier %d causes the binary value of b to be printed as its signed version i,e, -12.

However,

printf("%f\n",2);    //prints 0.000000
printf("%f\n",100);   //prints 0.000000
printf("%d\n",3.2);    //prints 2147483639

printf("%d\n",3.1);    //prints 2147483637

I kind of expected the 2 to be printed as 2.00000 and 3.2 to be printed as 3 as per type conversion norms.

Why does this not happen and what exactly takes place at machine level ?

Mismatching format specifier and argument type (like using the floating point specifier "%f" to print an int value) leads to undefined behavior .

Remember that 2 is an integer value, and vararg functions (like printf ) doesn't really know the types of the arguments. The printf function have to rely on the format specifier to assume the argument is of the specified type.


To better understand how you get the results you get, to understand "the internal happenings", we first must make two assumptions:

  • The system uses 32 bits for the int type
  • The system uses 64 bits for the double type

Now what happens with

printf("%f\n",2);    //prints 0.000000

is that the printf function sees the "%f" specifier, and fetch the next argument as a 64-bit double value. Since the int value you provided in the argument list is only 32 bits, half of the bits in the double value will be unknown. The printf function will then print the (invalid) double value. If you're unlucky some of the unknown bits might lead the value to be a trap value which can cause a crash.

Similarly with

printf("%d\n",3.2);    //prints 2147483639

the printf function fetches the next argument as a 32-bit int value, losing half of the bits in the 64-bit double value provided as the actual argument. Exactly which 32 bits are copied into the internal int value depends on endianness . Integers don't have trap values so no crashes happens, just an unexpected value will be printed.

what exactly takes place at machine level ?

The stdio.h functions are quite far from the machine level. They provide a standardized abstraction layer on top of various OS API. Whereas "machine level" would refer to the generated assembler. The behavior you experience is mostly related to details of the C language rather than the machine.

On the machine level, there exists no signed numbers, but everything is treated as raw binary data. The compiler can turn raw binary data into a signed number by using an instruction that tells the CPU: "use what's stored at this location and treat it as a signed number". Specifically, as a two's complement signed number on all common computers. But this is irrelevant when explaining why your code misbehaves.

The integer constant 12 is of type int . When we write -12 we apply the unary - operator on that. The result is still of type int but now of value -12 .

Then you attempt to store this negative number in an unsigned int . This triggers an implicit conversion to unsigned int , which should be carried out according to the C standard:

Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type

The maximum value of a 32 bit unsigned int is 2^32 - 1 , which equals 4.29*10^9 - 1 . "One more than the maximum" gives 4.29*10^9 . If we calculate -12 + 4.29*10^9 we get 4294967284 . This is in range of an unsigned int and is the result you see later.

Now as it happens, the printf family of functions is very unsafe. If you provide a wrong format specifier which doesn't matches the type, they might crash or display the wrong result etc - the program invokes undefined behavior.

So when you use %d or %i reserved for signed int, but pass an unsigned int, anything can happen. "Anything" includes the compiler trying to convert the passed type to match the passed format specifier. That's what happened when you used %d .

When you pass values of types completely mismatching the format specifier, the program just prints gibberish though. Because you are still invoking undefined behavior.

I kind of expected the 2 to be printed as 2.00000 and 3.2 to be printed as 3 as per type conversion norms.

The reason why the printf family can't do anything intelligent like assuming that 2 should be converted to 2.0 , is because they are variadic (variable argument) functions. Meaning they can take any number of arguments. In order to make that possible, the parameters are essentially passed as raw binary through something called va_list, and all type information is lost. The printf implementation is therefore left with no type information but the format string you gave it. This is why variadic functions are so unsafe to use.

Unlike a regular function which has more type safety - if you declare void foo (float f) and pass the integer constant 2 (type int ), it will attempt to implicitly convert from integer to float, and perhaps also give a conversion warning.

The behaviors you observe are the result of printf interpreting the bits given to it as the type specified by the format specifier. In particular, at least for your system:

  • The bits for an int argument and an unsigned argument in the same position within the argument list would be passed in the same place, so when you give printf one and tell it to format the other, it uses the bits you give it as if they were the bits of the other.
  • The bits for an int argument and a double argument would be passed in different places—possibly a general register for the int argument and a special floating-point register for the double argument, so when you give printf one and tell it to format the other, it does not get the bits for the double to use for the int ; it gets completely unrelated bits that were left lying around by previous operations.

Whenever a function is called, values for its arguments must be placed in certain places. These places vary according to the software and hardware used, and they vary by the type and number of arguments. However, for any particular argument type, argument position, and specific software and hardware used, there is a specific place (or combination of places) where the bits of that argument should be stored to be passed to the function. The rules for this are part of the Application Binary Interface (ABI) for the software and hardware being used.

First, let us neglect any compiler optimization or transformation and examine what happens when the compiler implements a function call in source code directly as a function call in assembly language. The compiler will take the arguments you provide for printf and write them to the places designated for those types of arguments . When printf executes, it examines the format string. When it sees a format specifier, it figures out what type of argument it should have, and it looks for the value of that argument in the place for that type of argument .

Now, there are two things that can happen. Say you passed an unsigned but used a format specifier for int , like %d . In every ABI I have seen, an unsigned and an int argument (in the same position within the list of arguments) are passed in the same place. So, when printf looks for the bits for the int it is expected, it will get the bits for the unsigned you passed.

Then printf will interpret those bits as if they encoded the value for an int , and it will print the results. In other words, the bits of your unsigned value are reinterpreted as the bits of an int . 1

This explains why you see “-12” when you pass the unsigned value 4,294,967,284 to printf to be formatted with %d . When the bits 11111111111111111111111111110100 are interpreted as an unsigned , they represent the value 4,294,967,284. When they are interpreted as an int , they represent the value −12 on your system. (This encoding system is called two's complement. Other encoding systems include one's complement and sign-and-magnitude, in which these bits would represent −1 and −2,147,483,636, respectively. Those systems are rare for plain integer types these days.)

That is the first of two things that can happen, and it is common when you pass the wrong type but it is similar to the correct type in size and nature—it is passed in the same place as the wrong type. The second thing that can happen is that the argument you pass is passed in a different place than the argument that is expected. For example, if you pass a double as an argument, it is, in many systems, placed in separate set of registers for floating-point values. When printf goes looking for an int argument for %d , it will not find the bits of your double at all. Instead, what it finds in the place where it looks for an int argument might be whatever bits happened to be left in a register or memory location from previous operations, or it might be the bits of the next argument in the list of arguments. In any case, this means that the value printf prints for the %d will have nothing to do with the double value you passed, because the bits of the double are not involved in any way—a complete different set of bits is used.

This is also part of the reason the C standard says it does not define the behavior when the wrong argument type is passed for a printf conversion. Once you have messed up the argument list by passing double where an int should have been, all the following arguments may be in the wrong places too. They might be in different registers from where they are expected, or they might be in different stack locations from where they are expected. printf has no way to recover from this mistake.

As stated, all of the above neglects compiler optimization. The rules of C arose out of various needs, such as accommodating the problems above and making C portable to a variety of systems. However, once those rules are written, compilers can take advantage of them to allow optimization. The C standard permits a compiler to make any transformation of a program as long as the changed program has the same behavior as the original program under the rules of the C standard. This permission allows compilers to speed up programs tremendously in some circumstances. But a consequence is that, if your program has behavior not defined by the C standard (and not defined by any other rules the compiler follows), it is allowed to transform your program into anything . Over the years, compilers have grown increasingly aggressive about their optimizations, and they continue to grow. This means, aside from the simple behaviors described above, when you pass incorrect arguments to printf , the compiler is allowed to produce completely different results. Therefore, although you may commonly see the behaviors I describe above, you may not rely on them.

Footnote

1 Note that this is not a conversion . A conversion is an operation whose input is one type and whose output is another type but has the same value (or as nearly the same as is possible, in some sense, as when we convert a double 3.5 to an int 3). In some cases, a conversion does not require any change to the bits—an unsigned 3 and an int 3 use the same bits to represent 3, so the conversion does not change the bits, and the result is the same as a reinterpretation. But they are conceptually different.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM