尾调用递归

Question

I'm implementing a function as following: 我正在实现如下函数：

void Add(list* node)
{
    if(this->next == NULL)
        this->next = node;
    else
        this->next->Add(node);
}

As it seems Add is going to be tail-called in every step of the recursion. 因为看起来Add会在递归的每一步都被尾调用。
I could also implement it as: 我也可以实现它：

void Add(list *node)
{
    list *curr = this;
    while(curr->next != NULL) curr = curr->next;
    curr->next = node;
}

This will not use recursion at all. 这根本不会使用递归。
Which version of this is better? 哪个版本更好？ (in stack size or speed) （堆栈大小或速度）
Please don't give the "Why don't use STL/Boost/whatever?" 请不要给出“为什么不使用STL / Boost /其他？” comments/answers. 意见/答案。

Answer 1

They probably will be the same performance-wise, since the compiler will probably optimise them into the exact same code. 它们可能在性能上是相同的，因为编译器可能会将它们优化为完全相同的代码。

However, if you compile on Debug settings, the compiler will not optimise for tail-recursion, so if the list is long enough, you can get a stack overflow. 但是，如果在Debug设置上进行编译，编译器将不会针对尾部递归进行优化，因此如果列表足够长，则可能会出现堆栈溢出。 There is also the (very small) possibility that a bad compiler won't optimise the recursive version for tail-recursion. 还有（非常小的）可能性，错误的编译器不会优化尾递归的递归版本。 There is no risk of that in the iterative version. 在迭代版本中没有风险。

Pick whichever one is clearer and easier for you to maintain taking the possibility of non-optimisation into account. 选择哪一个更清晰，更容易保持考虑非优化的可能性。

Answer 2

I tried it out, making three files to test your code: 我试了一下，制作了三个文件来测试你的代码：

node.hh: node.hh：

struct list {
  list *next;
  void Add(list *);
};

tail.cc: tail.cc：

#include "node.hh"

void list::Add(list* node)
{
    if(!this->next)
        this->next = node;
    else
        this->next->Add(node);
}

loop.cc: loop.cc：

#include "node.hh"

void list::Add(list *node)
{
    list *curr = this;
    while(curr->next) curr = curr->next;
    curr->next = node;
}

Compiled both files with G++ 4.3 for IA32, with -O3 and -S to give assembly output rather than object files 使用G ++ 4.3 for IA32编译这两个文件，使用-O3和-S来提供程序集输出而不是目标文件

Results: 结果：

tail.s: tail.s：

_ZN4list3AddEPS_:
.LFB0:
        .cfi_startproc
        .cfi_personality 0x0,__gxx_personality_v0
        pushl   %ebp
        .cfi_def_cfa_offset 8
        movl    %esp, %ebp
        .cfi_offset 5, -8
        .cfi_def_cfa_register 5
        movl    8(%ebp), %eax
        .p2align 4,,7
        .p2align 3
.L2:
        movl    %eax, %edx
        movl    (%eax), %eax
        testl   %eax, %eax
        jne     .L2
        movl    12(%ebp), %eax
        movl    %eax, (%edx)
        popl    %ebp
        ret
        .cfi_endproc

loop.s: loop.s：

_ZN4list3AddEPS_:
.LFB0:
        .cfi_startproc
        .cfi_personality 0x0,__gxx_personality_v0
        pushl   %ebp
        .cfi_def_cfa_offset 8
        movl    %esp, %ebp
        .cfi_offset 5, -8
        .cfi_def_cfa_register 5
        movl    8(%ebp), %edx
        jmp     .L3
        .p2align 4,,7
        .p2align 3
.L6:
        movl    %eax, %edx
.L3:
        movl    (%edx), %eax
        testl   %eax, %eax
        jne     .L6
        movl    12(%ebp), %eax
        movl    %eax, (%edx)
        popl    %ebp
        ret
        .cfi_endproc

Conclusion: The output is substantially similar enough (the core loop/recursion becomes movl, movl, testl, jne in both) that it really isn't worth worrying about. 结论：输出基本相似（核心循环/递归变为movl, movl, testl, jne ），这真的不值得担心。 There's one less unconditional jump in the recursive version, although I wouldn't want to bet either way which is faster if it's even measurable at all. 在递归版本中有一个不那么无条件的跳转，虽然我不想打赌哪个更快，如果它甚至可以测量的话。 Pick which ever is most natural to express the algorithm at hand. 挑选哪些是最自然的表达手头的算法。 Even if you later decide that was a bad choice it's not too hard to switch. 即使您稍后决定，这是一个不错的选择，这不是太硬切换。

Adding -g to the compilation doesn't change the actual implementation with g++ either, although there is the added complication that setting break points no longer behaves like you would expect it to - break points on the tail call line gets hit at most once (but not at all for a 1 element list) in my tests with GDB, regardless of how deep the recursion actually goes. 在编译中添加-g也不会改变g ++的实际实现，尽管增加的复杂性是设置断点不再像你期望的那样 - 尾部调用行上的断点最多被击中一次（无论递归的实际深度如何，在我使用GDB的测试中，对于1元素列表都没有，但根本没有。

Timings: 时序：

Out of curiosity I ran some timings with the same variant of g++. 出于好奇，我使用相同的g ++变体运行了一些时间。 I used: 我用了：

#include <cstring>
#include "node.hh"

static const unsigned int size = 2048;
static const unsigned int count = 10000;

int main() {
   list nodes[size];
   for (unsigned int i = 0; i < count; ++i) {
      std::memset(nodes, 0, sizeof(nodes));
      for (unsigned int j = 1; j < size; ++j) {
        nodes[0].Add(&nodes[j]);
      }
   }
}

This was run 200 times, with each of the loop and the tail call versions. 这运行了200次，每个循环和尾部调用版本。 The results with this compiler on this platform were fairly conclusive. 这个编译器在这个平台上的结果是相当确定的。 Tail had a mean of 40.52 seconds whereas lop had a mean of 66.93. 尾巴平均为40.52秒，而垂耳平均值为66.93。 (The standard deviations were 0.45 and 0.47 respectively). （标准偏差分别为0.45和0.47）。

箱形图的结果

So I certainly wouldn't be scared of using tail call recursion if it seems the nicer way of expressing the algorithm, but I probably wouldn't go out of my way to use it either, since I suspect that these timing observations would most likely vary from platform/compiler (versions). 所以我当然不会害怕使用尾调用递归，如果它似乎是表达算法的更好方式，但我可能也不会忘记使用它，因为我怀疑这些时序观察最有可能与平台/编译器（版本）不同。

尾调用递归

问题描述

2 个解决方案

解决方案1
7 已采纳 2011-09-02 22:35:42

解决方案2
5 2011-09-02 22:55:20

node.hh: node.hh：

tail.cc: tail.cc：

loop.cc: loop.cc：

Results: 结果：

tail.s: tail.s：

loop.s: loop.s：

Timings: 时序：

尾调用递归

问题描述

2 个解决方案

解决方案1 7 已采纳 2011-09-02 22:35:42

解决方案2 5 2011-09-02 22:55:20

node.hh: node.hh：

tail.cc: tail.cc：

loop.cc: loop.cc：

Results: 结果：

tail.s: tail.s：

loop.s: loop.s：

Timings: 时序：

解决方案1
7 已采纳 2011-09-02 22:35:42

解决方案2
5 2011-09-02 22:55:20