[英]How does GCC optimize out an unused variable incremented inside a loop?
I wrote this simple C program: 我写了这个简单的C程序:
int main() {
int i;
int count = 0;
for(i = 0; i < 2000000000; i++){
count = count + 1;
}
}
I wanted to see how the gcc compiler optimizes this loop (clearly add 1 2000000000 times should be "add 2000000000 one time"). 我想看看gcc编译器如何优化这个循环(显然添加1 2000000000次应该是“一次添加2000000000 ”)。 So:
所以:
gcc test.c and then time
on a.out
gives: gcc test.c然后
a.out
time
给出:
real 0m7.717s
user 0m7.710s
sys 0m0.000s
$ gcc -O2 test.c and then time on
a.out` gives: $ gcc -O2 test.c然后a.out`
time on
时间给出:
real 0m0.003s
user 0m0.000s
sys 0m0.000s
Then I disassembled both with gcc -S
. 然后我用
gcc -S
拆卸了两个。 First one seems quite clear: 第一个似乎很清楚:
.file "test.c"
.text
.globl main
.type main, @function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
movq %rsp, %rbp
.cfi_offset 6, -16
.cfi_def_cfa_register 6
movl $0, -8(%rbp)
movl $0, -4(%rbp)
jmp .L2
.L3:
addl $1, -8(%rbp)
addl $1, -4(%rbp)
.L2:
cmpl $1999999999, -4(%rbp)
jle .L3
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2"
.section .note.GNU-stack,"",@progbits
L3 adds, L2 compare -4(%rbp)
with 1999999999
and loops to L3 if i < 2000000000
. L3添加,L2比较
-4(%rbp)
与1999999999
并且如果i < 2000000000
则循环到L3。
Now the optimized one: 现在优化的一个:
.file "test.c"
.text
.p2align 4,,15
.globl main
.type main, @function
main:
.LFB0:
.cfi_startproc
rep
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2"
.section .note.GNU-stack,"",@progbits
I can't understand at all what's going on there! 我根本无法理解那里发生了什么! I've got little knowledge of assembly, but I expected something like
我对装配知之甚少,但我期待类似的东西
addl $2000000000, -8(%rbp)
I even tried with gcc -c -g -Wa,-a,-ad -O2 test.c to see the C code together with the assembly it was converted to, but the result was no more clear that the previous one. 我甚至尝试使用gcc -c -g -Wa,-a,-ad -O2 test.c来查看C代码以及它转换为的程序集,但结果不再清楚前一个。
Can someone briefly explain: 有人能简单解释一下:
The compiler is even smarter than that. 编译器甚至比这更聪明。 :)
:)
In fact, it realizes that you aren't using the result of the loop. 实际上,它意识到你没有使用循环的结果。 So it took out the entire loop completely!
所以它完全取出了整个循环!
This is called Dead Code Elimination . 这称为死代码消除 。
A better test is to print the result: 更好的测试是打印结果:
#include <stdio.h>
int main(void) {
int i; int count = 0;
for(i = 0; i < 2000000000; i++){
count = count + 1;
}
// Print result to prevent Dead Code Elimination
printf("%d\n", count);
}
EDIT : I've added the required #include <stdio.h>
; 编辑:我添加了所需的
#include <stdio.h>
; the MSVC assembly listing corresponds to a version without the #include
, but it should be the same. MSVC程序集列表对应于没有
#include
的版本,但它应该是相同的。
I don't have GCC in front of me at the moment, since I'm booted into Windows. 我现在没有GCC在我面前,因为我已经启动进入Windows。 But here's the disassembly of the version with the
printf()
on MSVC: 但这是在MSVC上使用
printf()
对版本进行反汇编:
EDIT : I had the wrong assembly output. 编辑:我有错误的汇编输出。 Here's the correct one.
这是正确的。
; 57 : int main(){
$LN8:
sub rsp, 40 ; 00000028H
; 58 :
; 59 :
; 60 : int i; int count = 0;
; 61 : for(i = 0; i < 2000000000; i++){
; 62 : count = count + 1;
; 63 : }
; 64 :
; 65 : // Print result to prevent Dead Code Elimination
; 66 : printf("%d\n",count);
lea rcx, OFFSET FLAT:??_C@_03PMGGPEJJ@?$CFd?6?$AA@
mov edx, 2000000000 ; 77359400H
call QWORD PTR __imp_printf
; 67 :
; 68 :
; 69 :
; 70 :
; 71 : return 0;
xor eax, eax
; 72 : }
add rsp, 40 ; 00000028H
ret 0
So yes, Visual Studio does this optimization. 是的,Visual Studio进行了这种优化。 I'd assume GCC probably does too.
我认为GCC也可能会这样做。
And yes, GCC performs a similar optimization. 是的,GCC执行类似的优化。 Here's an assembly listing for the same program with
gcc -S -O2 test.c
(gcc 4.5.2, Ubuntu 11.10, x86): 这是使用
gcc -S -O2 test.c
(gcc 4.5.2,Ubuntu 11.10,x86)的同一程序的汇编列表:
.file "test.c"
.section .rodata.str1.1,"aMS",@progbits,1
.LC0:
.string "%d\n"
.text
.p2align 4,,15
.globl main
.type main, @function
main:
pushl %ebp
movl %esp, %ebp
andl $-16, %esp
subl $16, %esp
movl $2000000000, 8(%esp)
movl $.LC0, 4(%esp)
movl $1, (%esp)
call __printf_chk
leave
ret
.size main, .-main
.ident "GCC: (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2"
.section .note.GNU-stack,"",@progbits
Compilers have a few tools at their disposal to make code more efficient or more "efficient": 编译器有一些工具可以使代码更高效或更“高效”:
If the result of a computation is never used, the code that performs the computation can be omitted (if the computation acted upon volatile
values, those values must still be read but the results of the read may be ignored). 如果从未使用计算结果,则可以省略执行计算的代码(如果计算对
volatile
值起作用,则必须仍然读取这些值,但可以忽略读取的结果)。 If the results of the computations that fed it weren't used, the code that performs those can be omitted as well. 如果未使用提供它的计算结果,则也可以省略执行这些计算的代码。 If such omission makes the code for both paths on a conditional branch identical, the condition may be regarded as unused and omitted.
如果这种省略使得条件分支上的两个路径的代码相同,则该条件可以被认为是未使用的并且被省略。 This will have no effect on the behaviors (other than execution time) of any program that doesn't make out-of-bounds memory accesses or invoke what Annex L would call "Critical Undefined Behaviors".
这对任何没有超出内存访问的程序的行为(除了执行时间)没有影响,或者调用附件L称之为“严重未定义行为”的行为。
If the compiler determines that the machine code that computes a value can only produce results in a certain range, it may omit any conditional tests whose outcome could be predicted on that basis. 如果编译器确定计算值的机器代码只能在某个范围内产生结果,则可以省略任何条件测试,其结果可以在此基础上预测。 As above, this will not affect behaviors other than execution time unless code invokes "Critical Undefined Behaviors".
如上所述,除非代码调用“Critical Undefined Behaviors”,否则这不会影响执行时间以外的行为。
If the compiler determines that certain inputs would invoke any form of Undefined Behavior with the code as written, the Standard would allow the compiler to omit any code which would only be relevant when such inputs are received, even if the natural behavior of the execution platform given such inputs would have been benign and the compiler's rewrite would make it dangerous. 如果编译器确定某些输入将使用所写的代码调用任何形式的未定义行为,则标准将允许编译器省略任何仅在接收到此类输入时才相关的代码,即使执行平台的自然行为也是如此鉴于此类输入本来是良性的,编译器的重写将使其变得危险。
Good compilers do #1 and #2. 好的编译器会做#1和#2。 For some reason, however, #3 has become fashionable.
然而,出于某种原因,#3已成为时尚。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.