简体   繁体   English

GCC如何优化循环内增加的未使用变量?

[英]How does GCC optimize out an unused variable incremented inside a loop?

I wrote this simple C program: 我写了这个简单的C程序:

int main() {
    int i;
    int count = 0;
    for(i = 0; i < 2000000000; i++){
        count = count + 1;
    }
}

I wanted to see how the gcc compiler optimizes this loop (clearly add 1 2000000000 times should be "add 2000000000 one time"). 我想看看gcc编译器如何优化这个循环(显然添加1 2000000000次应该是“一次添加2000000000 ”)。 So: 所以:

gcc test.c and then time on a.out gives: gcc test.c然后a.out time给出:

real 0m7.717s  
user 0m7.710s  
sys 0m0.000s  

$ gcc -O2 test.c and then time on a.out` gives: $ gcc -O2 test.c然后a.out` time on时间给出:

real 0m0.003s  
user 0m0.000s  
sys 0m0.000s  

Then I disassembled both with gcc -S . 然后我用gcc -S拆卸了两个。 First one seems quite clear: 第一个似乎很清楚:

    .file "test.c"  
    .text  
.globl main
    .type   main, @function  
main:
.LFB0:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    movq    %rsp, %rbp
    .cfi_offset 6, -16
    .cfi_def_cfa_register 6
    movl    $0, -8(%rbp)
    movl    $0, -4(%rbp)
    jmp .L2
.L3:
    addl    $1, -8(%rbp)
    addl    $1, -4(%rbp)
.L2:
    cmpl    $1999999999, -4(%rbp)
    jle .L3
    leave
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE0:
    .size   main, .-main
    .ident  "GCC: (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2"
    .section    .note.GNU-stack,"",@progbits

L3 adds, L2 compare -4(%rbp) with 1999999999 and loops to L3 if i < 2000000000 . L3添加,L2比较-4(%rbp)1999999999并且如果i < 2000000000则循环到L3。

Now the optimized one: 现在优化的一个:

    .file "test.c"  
    .text
    .p2align 4,,15
.globl main
    .type main, @function
main:
.LFB0:
    .cfi_startproc
    rep
    ret
    .cfi_endproc
.LFE0:
    .size main, .-main
    .ident "GCC: (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2"
    .section .note.GNU-stack,"",@progbits

I can't understand at all what's going on there! 我根本无法理解那里发生了什么! I've got little knowledge of assembly, but I expected something like 我对装配知之甚少,但我期待类似的东西

addl $2000000000, -8(%rbp)

I even tried with gcc -c -g -Wa,-a,-ad -O2 test.c to see the C code together with the assembly it was converted to, but the result was no more clear that the previous one. 我甚至尝试使用gcc -c -g -Wa,-a,-ad -O2 test.c来查看C代码以及它转换为的程序集,但结果不再清楚前一个。

Can someone briefly explain: 有人能简单解释一下:

  1. The gcc -S -O2 output. gcc -S -O2输出。
  2. If the loop is optimized as I expected (one sum instead of many sums)? 如果循环按照我的预期进行优化(一个总和而不是多个总和)?

The compiler is even smarter than that. 编译器甚至比这更聪明。 :) :)

In fact, it realizes that you aren't using the result of the loop. 实际上,它意识到你没有使用循环的结果。 So it took out the entire loop completely! 所以它完全取出了整个循环!

This is called Dead Code Elimination . 这称为死代码消除

A better test is to print the result: 更好的测试是打印结果:

#include <stdio.h>
int main(void) {
    int i; int count = 0;
    for(i = 0; i < 2000000000; i++){
        count = count + 1;
    }

    //  Print result to prevent Dead Code Elimination
    printf("%d\n", count);
}

EDIT : I've added the required #include <stdio.h> ; 编辑:我添加了所需的#include <stdio.h> ; the MSVC assembly listing corresponds to a version without the #include , but it should be the same. MSVC程序集列表对应于没有#include的版本,但它应该是相同的。


I don't have GCC in front of me at the moment, since I'm booted into Windows. 我现在没有GCC在我面前,因为我已经启动进入Windows。 But here's the disassembly of the version with the printf() on MSVC: 但这是在MSVC上使用printf()对版本进行反汇编:

EDIT : I had the wrong assembly output. 编辑:我有错误的汇编输出。 Here's the correct one. 这是正确的。

; 57   : int main(){

$LN8:
    sub rsp, 40                 ; 00000028H

; 58   : 
; 59   : 
; 60   :     int i; int count = 0;
; 61   :     for(i = 0; i < 2000000000; i++){
; 62   :         count = count + 1;
; 63   :     }
; 64   : 
; 65   :     //  Print result to prevent Dead Code Elimination
; 66   :     printf("%d\n",count);

    lea rcx, OFFSET FLAT:??_C@_03PMGGPEJJ@?$CFd?6?$AA@
    mov edx, 2000000000             ; 77359400H
    call    QWORD PTR __imp_printf

; 67   : 
; 68   : 
; 69   : 
; 70   :
; 71   :     return 0;

    xor eax, eax

; 72   : }

    add rsp, 40                 ; 00000028H
    ret 0

So yes, Visual Studio does this optimization. 是的,Visual Studio进行了这种优化。 I'd assume GCC probably does too. 我认为GCC也可能会这样做。

And yes, GCC performs a similar optimization. 是的,GCC执行类似的优化。 Here's an assembly listing for the same program with gcc -S -O2 test.c (gcc 4.5.2, Ubuntu 11.10, x86): 这是使用gcc -S -O2 test.c (gcc 4.5.2,Ubuntu 11.10,x86)的同一程序的汇编列表:

        .file   "test.c"
        .section        .rodata.str1.1,"aMS",@progbits,1
.LC0:
        .string "%d\n"
        .text
        .p2align 4,,15
.globl main
        .type   main, @function
main:
        pushl   %ebp
        movl    %esp, %ebp
        andl    $-16, %esp
        subl    $16, %esp
        movl    $2000000000, 8(%esp)
        movl    $.LC0, 4(%esp)
        movl    $1, (%esp)
        call    __printf_chk
        leave
        ret
        .size   main, .-main
        .ident  "GCC: (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2"
        .section        .note.GNU-stack,"",@progbits

Compilers have a few tools at their disposal to make code more efficient or more "efficient": 编译器有一些工具可以使代码更高效或更“高效”:

  1. If the result of a computation is never used, the code that performs the computation can be omitted (if the computation acted upon volatile values, those values must still be read but the results of the read may be ignored). 如果从未使用计算结果,则可以省略执行计算的代码(如果计算对volatile值起作用,则必须仍然读取这些值,但可以忽略读取的结果)。 If the results of the computations that fed it weren't used, the code that performs those can be omitted as well. 如果未使用提供它的计算结果,则也可以省略执行这些计算的代码。 If such omission makes the code for both paths on a conditional branch identical, the condition may be regarded as unused and omitted. 如果这种省略使得条件分支上的两个路径的代码相同,则该条件可以被认为是未使用的并且被省略。 This will have no effect on the behaviors (other than execution time) of any program that doesn't make out-of-bounds memory accesses or invoke what Annex L would call "Critical Undefined Behaviors". 这对任何没有超出内存访问的程序的行为(除了执行时间)没有影响,或者调用附件L称之为“严重未定义行为”的行为。

  2. If the compiler determines that the machine code that computes a value can only produce results in a certain range, it may omit any conditional tests whose outcome could be predicted on that basis. 如果编译器确定计算值的机器代码只能在某个范围内产生结果,则可以省略任何条件测试,其结果可以在此基础上预测。 As above, this will not affect behaviors other than execution time unless code invokes "Critical Undefined Behaviors". 如上所述,除非代码调用“Critical Undefined Behaviors”,否则这不会影响执行时间以外的行为。

  3. If the compiler determines that certain inputs would invoke any form of Undefined Behavior with the code as written, the Standard would allow the compiler to omit any code which would only be relevant when such inputs are received, even if the natural behavior of the execution platform given such inputs would have been benign and the compiler's rewrite would make it dangerous. 如果编译器确定某些输入将使用所写的代码调用任何形式的未定义行为,则标准将允许编译器省略任何仅在接收到此类输入时才相关的代码,即使执行平台的自然行为也是如此鉴于此类输入本来是良性的,编译器的重写将使其变得危险。

Good compilers do #1 and #2. 好的编译器会做#1和#2。 For some reason, however, #3 has become fashionable. 然而,出于某种原因,#3已成为时尚。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM