简体   繁体   中英

Why exactly does this short program produce this output?

I've had a slow weekend, so just for interest I started working through KN King's 'C Programming: A Modern Approach' book today, and started to work through the exercises in the second chapter. One of the exercises is this:

Write a program that declares several int and float variables - without initialising them - and then prints their values.

My little solution to this is below, including the output. It isn't really a problem as such, I'm just quite curious as to why it does what it does, especially since I'm not so well-informed on lower-level languages.

I had a quick look for some other pre-made solutions on GitHub hoping they'd be commented or something, but its such a simple problem there really was nothing. KN King's own site suggests that the pattern of the output depends on, quote, "many factors", but doesn't divulge any more. This is reflected in my output being different to King's.

#include <stdio.h>

int main()
{
    int num1, num2, num3;    
    float flo1, flo2, flo3;

    printf("Our integers are %d, %d, %d\n", num1, num2, num3); 
    printf("Our floats are %g, %g, %g\n", flo1, flo2, flo3);

    return 0;
}   

The output is below:

C:\C\Intro\exercises>a
Our integers are 0, 16, 0
Our floats are 2.8026e-045, 0, 1.73639e-038

Again, not so much a problem, just curious what this is doing, probably at the hardware level.

Strictly speaking, your code has undefined behaviour , meaning it could do pretty much whatever it pleases.

In practice, your variables live on the stack but are not initialised. This likely means they pick up whatever values the stack happens to contain at the locations where the variables get placed by the compiler. Those values are most likely left over from routines that were called earlier in your process's lifetime, ie during its startup.

First, let's consider how a very simple compiler might handle this code. When it sees int num1, num2, num3; inside a function, it may make space for these on the stack. A stack is commonly how compilers implement objects with automatic storage duration (notably variables defined inside functions that are not static or local to a thread). Whenever a new function is called, the compiler writes code to make space on the stack for its local variables and other information. Similarly, space is also allocated for float flo1, flo2, flo3; .

Then, when the compiler sees printf("Our integers are %d, %d, %d\\n", num1, num2, num3); , it generates code to load the values of num1 , num2 , and num3 and to pass them to printf . The values are loaded from the memory that was allocated for these objects. What is in that memory? Well, this source code does not assign any values to those objects, so the data in that memory is whatever data was there when the main routine started.

What was in that memory? Commonly, when an operating system provides general memory to a process, it clears the memory (sets all bytes in it to zero) so that it does not reveal any data of whatever program used the memory previously. So why are not the printf statements printing zeros?

main is not actually the start of your program. Before main can be executed, something has to set up the C environment. Running a C program requires that any data used by library routines you might call (such as printf ) be initialized. Also, when the main routine returns, it has to have something to return to, something that will take the return value and pass it to the system as a process exit status. That code is also responsible for closing open files and doing some other clean-up work. Commonly, when you link a C program, an extra “start” routine is linked into your executable file. When the operating system starts your program, it calls this “start” routine first, and the start routine sets up the C environment and then calls main .

So, when you print num1 , num2 , num3 , flo1 , flo2 , and flo3 , the memory allocated for them has already been used by the “start” routine, and it contains whatever data the “start” routine happened to leave lying around.

That is one explanation for why you see various values printed by this source code.

On the other hand, let's consider a more sophisticated compiler. A more sophisticated compiler analyzes the code and can see that the variables are used without being initialized. It will warn the user about this, and it also knows that this violates various rules in C. In particular, the C standard does not define what happens when you use an object with automatic storage duration that has been neither initialized nor (for technical/esoteric reasons) had its address taken.

To assist with optimization, sophisticated compilers have special ways of dealing with undefined behavior. For example, if the compiler sees code such as:

if (some test)
    FunctionA();
else
{
    Some undefined behavior here…
    FunctionB();
}

the compiler can optimize this by “choosing” how to define the undefined behavior. It can define the behavior to alter the program as if it had been written:

if (some test)
    FunctionA();
else
{
    FunctionA();
}

because that is a valid instance of undefined behavior. Then optimization can proceed to simplify that to:

FunctionA();

Sometimes cases like this arise in code because a programmer was writing for portability to various environments, and it happens that some test indeed cannot be false in a particular compiler, and this optimization produces correct and simple code. Cases like this can also arise where a compiler has been transforming code in other ways and the code above arises not because it was literally written that way in the source code but was generated by the compiler during its internal transformations. For example, a compiler might split a loop into separate code for the first iteration, the general middle iterations, and the last iteration, and some test might be always true in the last iteration, even though it was not always true in the context where the programmer wrote it.

What this means is that, when you use undefined behavior (that is not only undefined according to the C standard but also not defined by the C implementation), it may be transformed in ways you do not expect.

I tested this code with a version of LLVM and Clang, and the compiler optimized it by not allocating any memory for the variables and not loading them from memory to pass to printf . Instead, it just called printf without any preparation for those arguments. In the platform I am using, those arguments are passed in registers. So the result is that printf prints whatever values happen to be in those registers. As with the memory, this will be whatever data happened to be left in that memory by earlier software.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM