简体   繁体   中英

Difference between exit() and return in main() function in C

I've looked through the links What is the difference between exit and return? and return statement vs exit() in main() to find the answer, but in vain.

Problem with the first link is that the answer assumes return from any function. I want to know the exact difference between the two when in main() function. Even if there's a little difference I'd like to know what it is. Which is preferred and why? Is there any performance gain in using return over exit() (or exit() over return ) with all sorts of compiler optimizations turned off?

Problem with the second link is I'm not interested in knowing what happens in C++. I want the answer specifically pertaining to C.

EDIT: After recommendation by a person, I actually tried to compare the assembly output of the following programs:

Note: Using gcc -S <myprogram>.c

Program mainf.c:

int main(void){
 return 0;
}

Assembly output:

    .file   "mainf.c"
    .text
    .globl  main
    .type   main, @function
main:
.LFB0:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    movl    $0, %eax
    popq    %rbp
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE0:
    .size   main, .-main
    .ident  "GCC: (Ubuntu 4.9.2-10ubuntu13) 4.9.2"
    .section    .note.GNU-stack,"",@progbits

Program mainf1.c:

#include <stdlib.h>

int main(void){
 exit(0);
}

Assembly output:

    .file   "mainf1.c"
    .text
    .globl  main
    .type   main, @function
main:
.LFB2:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    movl    $0, %edi
    call    exit
    .cfi_endproc
.LFE2:
    .size   main, .-main
    .ident  "GCC: (Ubuntu 4.9.2-10ubuntu13) 4.9.2"
    .section    .note.GNU-stack,"",@progbits

Noting that I'm not well versed with assembly, I can see some differences between the 2 programs with the exit() version being shorter than return version. What's the difference?

Disclaimer: This answer does not quote the C Standards.

TL;DR

Both the methods jump into GLibC code , and to know exactly what that code is doing or which one is faster or more efficient, you'll need to read them. If you want to know more about the GLibC, you should check the sources for the GCC and GLibC. There are links in the end for those.


Syscalls, wrappers and GLibC

First: there's a difference between exit(3) and _exit(2) . The first is a GLibC wrapper around the second, which is a system call . The one we use in our program, and requires the inclusion of stdlib.h is exit(3) - the GLibC wrapper, not the system call.

Now, programs are not just your simple instructions. They contain heavy loads of GLibC's own instructions . These GLibC functions serve several purposes related to loading and providing the library functionality you use. For that to work GLibC must be "inside" your program.

So, how is GLibC inside your program? Well, it puts itself there through your compiler (it sets some static code and some hooks into the dynamic library) - most likely you're using gcc .


The 'return 0;' method

I suppose you know what stack frames are, so I won't explain what they are. The cool thing to notice is that main() itself has it's own stack frame. And that stack frame returns somewhere and it must return... But, to where ?

Lets compile the following:

int main(void)
{
        return 0;
}

And compile and debug it with:

$ gcc -o main main.c

$ gdb main

(gdb) disass main
Dump of assembler code for function main:
0x00000000004005e8 <+0>:     push   %rbp
0x00000000004005e9 <+1>:     mov    %rsp,%rbp
0x00000000004005ec <+4>:     mov    $0x0,%eax
0x00000000004005f1 <+9>:     pop    %rbp
0x00000000004005f2 <+10>:    retq
End of assembler dump.

(gdb) break main
(gdb) run 
Breakpoint 1, 0x00000000004005ec in main ()  
(gdb) stepi
...

Now, stepi will make for the fun part. This will jump one instruction at a time, so it's perfect to follow function calls. After you press run stepi for the first time, just hold your finger on ENTER until you get tired.

What you must observe is the sequence in which functions are called with this method. You see, ret is a "jumping" instruction ( edit: after David Hoelzer comment, I see that calling ret a simple jump is an over-generalization): after we pop rbp , ret itself will pop the return pointer from the stack and jump to it. So, if GLibC built that stack frame, retq is making our return 0; C statement jump right into GLibC's own code! How clever!

The order of function calls I got started roughly like this:

__libc_start_main
exit
__run_exit_handlers
_dl_fini
rtld_lock_default_lock_recursive
_dl_fini
_dl_sort_fini

The 'exit(0);' method

Compiling this:

#include <stdlib.h>
int main(void)
{
        exit(0);
}

And compiling and debugging...

$ gcc -o exit exit.c

$ gdb exit
(gdb) disass main
Dump of assembler code for function main:
0x0000000000400628 <+0>:     push   %rbp
0x0000000000400629 <+1>:     mov    %rsp,%rbp
0x000000000040062c <+4>:     mov    $0x0,%edi
0x0000000000400631 <+9>:     callq  0x4004d0 <exit@plt>
End of assembler dump.
(gdb) break main
(gdb) run
Breakpoint 1, 0x000000000040062c in main ()
(gdb) stepi
...

And the function sequence I got was:

exit@plt
??
_dl_runtime_resolve
_dl_fixup
_dl_lookup_symbol_x
do_lookup_x
check_match
_dl_name_match
strcmp

List object's Symbols

There's a cool tool for printing the symbols defined within a binary. It's nm . I suggest you take a look into it as it will give you an idea of how much "crap" it's added in a simple program like the ones above.

To use it in the simplest form:

$ nm main
$ nm exit

That will print a list of symbols in the file. Note that this list does not include references these functions will make. So if a given function in this list calls another function, the other probably won't be in the list.


Conclusion

It depends heavily on the way the GLibC choses to handle a simple stack frame return from main and how it implements the exit wrapper. In the end, the _exit(2) system call will get called and you'll exit your process.

Finally , to really answer your question: both the methods jump into GLibC code, and to know exactly what that code is doing you'll need to read it. If you want to know more about the GLibC, you should check the sources for the GCC and GLibC.


References

  • GLibC Source Repository : Look in stdlib/exit.c and stdlib/exit.h for the implementations.
  • Linux Kernel Exit Definition : look in kernel/exit.c for the _exit(2) system call implementation, and include/syscalls.h for the preprocessor magic behind it.
  • GCC Sources : I do not know the gcc (compiler, not suite) sources, and would appreciate if anyone could point out where the runtime sequence is defined.

Functionally, from the main() function there is really no difference in C. For example, even if you defined a function handler with the atexit() library call, both return() and exit() from main will call that function pointer.

The exit() call, however, has the flexibility that you can use it to cause a program to exit with a return code from any point within the code.

There are the technical differences. If you compile the following to assembly:

int main()
{
  return 1;
}

the final portion of that code will be:

movl $1, %eax
movl $0, -4(%rbp)
popq %rbp
retq

On the other hand, the following code compiled to assembly:

#include<stdlib.h>
int main()
{
  exit(1);
}

will be identical in all respects except that it ends as follows:

subq $16, %rsp
movl $1, %edi
movl $0, -4(%rbp)
callq _exit

Aside from the 1 being put into EDI rather than EAX as is required on the platform where I compiled this code as the calling convention to the _exit call, you'll note two differences. First, a stack alignment operation takes place to prepare for the function call. Second, rather than terminating with a retq , we are now calling into the system library, which will handle the final return code and return.

There is practically no difference between calling exit or executing return from main as long as main returns a type that is compatible with int .

From the C11 Standard:

5.1.2.2.3 Program termination

1 If the return type of the main function is a type compatible with int , a return from the initial call to the main function is equivalent to calling the exit function with the value returned by the main function as its argument; reaching the } that terminates the main function returns a value of 0. If the return type is not compatible with int , the termination status returned to the host environment is unspecified.

exit is a system call while return is an instruction of the language.

exit terminates current process, return returns from a function call.

In the main() function, they both accomplish the same thing:

int main() {
    // code
    return 0;
}

int main() {
    // code
    exit(0);
}

While in a function:

void f() {
    // code
    return; // return to where it was called from.
}

void f() {
    // code
    exit(0); // terminates program
}

One major difference between using return and calling exit() in the main() program is that if you call exit() , the local variables in the main() still exist and are valid, whereas if you return , they are not.

This matters if you've done anything such as:

#include <stdio.h>
#include <stdlib.h>

static void function_using_stdout(void)
{
    char space[512];
    char *base = space;
    for (int j = 0; j < 10; j++)
    {
        base += sprintf(base, "Hysterical raisins #%d (continued) ", j+1);
        printf("%d..%d: %.24s\n", j*24, j*24+23, space + j * 24);
    }
    printf("Catastrophic elegance\n");
}

int main(int argc, char **argv)
{
    char buffer[64];  // Deliberately rather small
    setvbuf(stdout, buffer, _IOFBF, sizeof(buffer));
    atexit(function_using_stdout);
    for (int i = 0; i < 3; i++)
        function_using_stdout();
    printf("All done - exiting now\n");
    if (argc > 1)
        return 1;
    else
        exit(2);
}

because now the function called (via atexit() ) from the startup code that called main() doesn't have a valid buffer for standard output. Whether it crashes or merely gets thoroughly confused or prints garbage or appears to work is open to debate.

I called the program hysteresis . When run with no arguments, it used exit() and worked correctly/sanely (the local space variable in function_using_stdout() was not sharing space with the I/O buffer for stdout ):

$ ./hysteresis 
'hysteresis' is up to date.
0..23: Hysterical raisins #1 (c
24..47: ontinued) Hysterical rai
48..71: sins #2 (continued) Hyst
72..95: erical raisins #3 (conti
96..119: nued) Hysterical raisins
120..143:  #4 (continued) Hysteric
144..167: al raisins #5 (continued
168..191: ) Hysterical raisins #6 
192..215: (continued) Hysterical r
216..239: aisins #7 (continued) Hy
Catastrophic elegance
0..23: Hysterical raisins #1 (c
24..47: ontinued) Hysterical rai
48..71: sins #2 (continued) Hyst
72..95: erical raisins #3 (conti
96..119: nued) Hysterical raisins
120..143:  #4 (continued) Hysteric
144..167: al raisins #5 (continued
168..191: ) Hysterical raisins #6 
192..215: (continued) Hysterical r
216..239: aisins #7 (continued) Hy
Catastrophic elegance
0..23: Hysterical raisins #1 (c
24..47: ontinued) Hysterical rai
48..71: sins #2 (continued) Hyst
72..95: erical raisins #3 (conti
96..119: nued) Hysterical raisins
120..143:  #4 (continued) Hysteric
144..167: al raisins #5 (continued
168..191: ) Hysterical raisins #6 
192..215: (continued) Hysterical r
216..239: aisins #7 (continued) Hy
Catastrophic elegance
All done - exiting now
0..23: Hysterical raisins #1 (c
24..47: ontinued) Hysterical rai
48..71: sins #2 (continued) Hyst
72..95: erical raisins #3 (conti
96..119: nued) Hysterical raisins
120..143:  #4 (continued) Hysteric
144..167: al raisins #5 (continued
168..191: ) Hysterical raisins #6 
192..215: (continued) Hysterical r
216..239: aisins #7 (continued) Hy
Catastrophic elegance
$

When called with at least one argument, things went haywire (the local space variable in function_using_stdout() was probably sharing space with the I/O buffer for stdout — unless that was being used by the code that executes the functions registered with atexit() ):

$ ./hysteresis aleph
0..23: Hysterical raisins #1 (c
24..47: ontinued) Hysterical rai
48..71: sins #2 (continued) Hyst
72..95: erical raisins #3 (conti
96..119: nued) Hysterical raisins
120..143:  #4 (continued) Hysteric
144..167: al raisins #5 (continued
168..191: ) Hysterical raisins #6 
192..215: (continued) Hysterical r
216..239: aisins #7 (continued) Hy
Catastrophic elegance
0..23: Hysterical raisins #1 (c
24..47: ontinued) Hysterical rai
48..71: sins #2 (continued) Hyst
72..95: erical raisins #3 (conti
96..119: nued) Hysterical raisins
120..143:  #4 (continued) Hysteric
144..167: al raisins #5 (continued
168..191: ) Hysterical raisins #6 
192..215: (continued) Hysterical r
216..239: aisins #7 (continued) Hy
Catastrophic elegance
0..23: Hysterical raisins #1 (c
24..47: ontinued) Hysterical rai
48..71: sins #2 (continued) Hyst
72..95: erical raisins #3 (conti
96..119: nued) Hysterical raisins
120..143:  #4 (continued) Hysteric
144..167: al raisins #5 (continued
168..191: ) Hysterical raisins #6 
192..215: (continued) Hysterical r
216..239: aisins #7 (continued) Hy
Catastrophic elegance
Al) Hysterical raisins #2 (continued) l raisins #1 (c
24..47: ontinued) Hysterical rai
48..71: l rai
48..71: nued) Hyst
72..95: 71: nued) Hyst
72..95: 7
96..119: nued) Hysterical raisins
120..143:  #4 (continued) Hysteric
144..167: al raisins #5 (continued
168..191: ) Hysterical raisins #6 
192..215: (continued) Hysterical r
216..239: aisins #7 (continued) Hy
Catastrophic elegance
$

Most of the time, this sort of thing isn't a problem. However, when it matters, it really does matter. And, note, it isn't visible as a problem until the program is exiting — which can make it tricky to debug.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM