简体   繁体   中英

Why does GCC insert seemingly non-essential instructions before a printf call?

I'm trying to learn x86 on my own and I decided to dissect a simple c program and see what GCC outputs. The program is this:

#include <stdio.h>
int main() {
  printf("%s","Hello World");
  return 0;
}

I compiled the code with -S and then stripped out things that I found unnecessary and reduced the assembly code to this.

.pfArg:
.string "%s"
.text

.Hello:
.string "Hello World"
.text

.globl main
.type   main, @function

main:
pushq   %rbp        # push what was in base pointer onto stack
movq    %rsp, %rbp  # move stack pointer to base pointer
subq    $16, %rsp   # subtract 16 from sp and store in stack pointer

# prepare arguments for printf
movl    $.Hello, %esi   # put & of "Hello World" into %esi
movq    $.pfArg, %rdi   # put & of "%d" into %eax
call    printf
leave
ret

Now almost everything in the code above makes sense to me except the first two under main. Although this is what I get without stripping things out.

.LC0:
    .string "%s"

.LC1:
    .string "Hello World"
    .text

.globl main
    .type   main, @function

main:

.LFB0:
    pushq   %rbp        # push what was in base pointer onto stack
    movq    %rsp, %rbp  # move stack pointer to base pointer

  # prepare arguments for printf
    movl    $.LC0, %eax # put arg into %eax
    movl    $.LC1, %esi # put second arg into %esi
    movq    %rax, %rdi  # move value in %rax to %rdi ???? ( why not just put $.LCO into %rax directly )
    movl    $0, %eax    # clear out %eax ???? ( why do we need to clear it out )
    call    printf      
    movl    $0, %eax    # return 0
    leave
    ret

.LFE0:
    .size   main, .-main
    .ident  "GCC: (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2"
    .section    .note.GNU-stack,"",@progbits

There are 2 instructions that I've marked with???? that I don't understand.

The first instruction is moving what is in %rax into %rdi to prepare for the printf call. Thats all fine except we just moved $.LC0 (which is the string "%s") into %eax. This seems unnecessary why didn't we just move $.LC0 into %rdi in the first place instead of moving it into %eax and then into %rdi?

The second instruction is clearing out %eax which I understand to be the return value of a function. But if the function is going to just clobber it anyways why do GCC care to clear it out?

A couple rules of thumb:

  1. Don't bother looking at unoptimized output if you're concerned about efficient code.
  2. Always measure, never assume, that your "improvements" at the assembly language level boost performance.

Even in optimized code, you may see seemingly unnecessary instructions such as "xor %eax,%eax" when there is no functional need to clobber a register. These instructions play a special roll by informing the pipeline that no data dependency for that register exists beyond that point. In a modern out-of-order processor, the core's pipeline speculatively executes many instructions ahead of the current EIP. Explicitly cutting data dependencies in this manner helps the speculation mechanism and can boost performance in tight loops especially.

In other cases, the compiler may apparently take a round-about approach when in fact it's trying to match the work at hand to the parallel execution units available in the target core's pipeline. More instructions dispatched in parallel often complete faster than fewer instructions serialized.

If you really care to squeeze out every last drop of performance, use a rdtsc instruction before and after a block of code to measure the number of clocks expended. Be a bit careful, since rdtsc isn't strictly ordered with surrounding instructions, but in practice measuring it's plenty accurate for anything in the 1000's of clocks range.

Are you viewing the optimized output, or unoptimized (which is basically a naive translation of C code into assembler)? That makes a huge difference as the optimizer is usually pretty good about applying the same kinds of rules as you describe.

The first instruction is moving what is in %rax into %rdi to prepare for the printf call. Thats all fine except we just moved $.LC0 (which is the string "%s" ) into %eax . This seems unnecessary why didn't we just move $.LC0 into %rdi in the first place instead of moving it into %eax and then into %rdi ?

That's probably because you're compiling with no optimisations. When I compile your example with GCC 4.2.1 on Mac OS X v10.6.8, I get the following output:

.globl _main
_main:
LFB3:
    pushq   %rbp
LCFI0:
    movq    %rsp, %rbp
LCFI1:
    leaq    LC0(%rip), %rsi
    leaq    LC1(%rip), %rdi
    movl    $0, %eax
    call    _printf
    movl    $0, %eax
    leave
    ret

As you can see, the arguments were directly stored into %rsi and %rdi .

The second instruction is clearing out %eax which I understand to be the return value of a function. But if the function is going to just clobber it anyways why do GCC care to clear it out?

Because the x86_64 ABI specifies that if a function takes variable arguments then AL (which is part of %eax ) is expected to hold the number of vector registers used for the arguments to that function call. Since you're not specifying floating-point arguments when calling printf() , no vector registers are used, so AL ( %eax ) is zeroed out. I give more examples in an answer to another question here .

Because GCC is a compiler, and compilers are dumb.

You can make GCC smarter by using -O2. It starts to use optimization tricks and reduces the redundant instructions.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM