[英]Why GCC didn't optimize this tail call?
I have the code working with lined lists. 我有使用内联列表的代码。 I use tail calls .
我用尾调用 。 Unfortunately, GCC does not optimise the calls.
不幸的是,GCC没有优化通话。
Here is C code of the function that recursively calculates length of the linked list: 这是递归计算链表长度的函数的C代码:
size_t ll_length(const ll_t* list) {
return ll_length_rec(list, 0);
}
size_t ll_length_rec(const ll_t* list, size_t size_so_far)
{
if (list) {
return ll_length_rec(list->next, size_so_far + 1);
} else {
return size_so_far;
}
}
and here is the assembler code: 这是汇编程序代码:
.globl _ll_length_rec
_ll_length_rec:
LFB8:
.loc 1 47 0
pushq %rbp
LCFI6:
movq %rsp, %rbp
LCFI7:
subq $32, %rsp
LCFI8:
movq %rdi, -8(%rbp)
movq %rsi, -16(%rbp)
.loc 1 48 0
cmpq $0, -8(%rbp)
je L8
.loc 1 49 0
movq -16(%rbp), %rsi
incq %rsi
movq -8(%rbp), %rax
movq 8(%rax), %rdi
call _ll_length_rec # < THIS SHOUD BE OPTIMIZED
movq %rax, -24(%rbp)
jmp L10
If GCC would optimize it, there would be no call
in the asm. 如果GCC会对其进行优化,那么asm就没有
call
。 I compile it with: 我编译它:
gcc -S -fnested-functions -foptimize-sibling-calls \
-03 -g -Wall -o llist llist.c
and GCC version is: 和GCC版本是:
i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5666) (dot 3)
If I add -O3
to your compilation line, it does not seem to generate the offending call, while without it, I get the unoptimised call. 如果我将
-O3
添加到你的编译行,它似乎不会产生有问题的调用,而没有它,我得到未经优化的调用。 I don't know all gcc options in my head, but is -03
a typo for -O3
or intentional? 我不知道我头脑中的所有gcc选项,但
-03
是-O3
或故意的错字?
Ltmp2:
pushq %rbp
Ltmp0:
movq %rsp, %rbp
Ltmp1:
jmp LBB1_1
.align 4, 0x90
LBB1_3:
addq $2, %rsi
Ltmp3:
movq (%rax), %rdi
Ltmp4:
LBB1_1:
Ltmp5:
testq %rdi, %rdi
je LBB1_5
Ltmp6:
movq (%rdi), %rax
testq %rax, %rax
jne LBB1_3
incq %rsi
LBB1_5:
movq %rsi, %rax
Ltmp7:
Ltmp8:
popq %rbp
ret
Most likely because neither of your functions are declared as static
, which means that the symbols must be visible to the linker in case any other compilation units need them at link time. 很可能是因为你的函数都没有声明为
static
,这意味着如果链接时任何其他编译单元需要它们,那么符号必须对链接器可见。 Try to compile with the -fwhole-program flag and see what happens. 尝试使用-fwhole-program标志进行编译,看看会发生什么。
Probably depends on the version of GCC and specific build. 可能取决于GCC的版本和特定版本。 This is what I get from GCC 3.4.4 on Windows starting from
-O2
and up 这是我在Windows上从
-O2
及以上开始从GCC 3.4.4获得的
.globl _ll_length_rec
.def _ll_length_rec; .scl 2; .type 32; .endef
_ll_length_rec:
pushl %ebp
movl %esp, %ebp
movl 8(%ebp), %edx
movl 12(%ebp), %eax
jmp L3
.p2align 4,,7
L6:
movl (%edx), %edx
incl %eax
L3:
testl %edx, %edx
jne L6
popl %ebp
ret
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.