简体繁体 English

C尾调用优化

[英]C tail call optimization

原文 2010-08-18 16:24:37 5 8 c/ standards/ tail-recursion/ tail-call-optimization

I often hear people say that C doesn't perform tail call elimination. 我经常听到人们说C不执行尾部呼叫消除。 Even though it's not guaranteed by the standard, isn't it performed in practice by any decent implementation anyhow? 虽然标准不能保证，但是无论如何，它是否在实践中通过任何体面的实现来执行？ Assuming you're only targeting mature, well implemented compilers and don't care about absolute maximum portability to primitive compilers written for obscure platforms, is it reasonable to rely on tail call elimination in C? 假设你只针对成熟的，实现良好的编译器而不关心为模糊平台编写的原始编译器的绝对最大可移植性，那么在C中依赖尾调用是否合理呢？

Also, what was the rationale for leaving tail call optimization out of the standard? 此外，将尾部呼叫优化留在标准之外的理由是什么？

8 个解决方案

Statements like "C doesn't perform tail call elimination" make no sense. 像“C不执行尾部调用消除”这样的语句没有任何意义。 As you correctly noted yourself, things like this depend entirely on the implementation. 正如你自己正确地指出的那样，这样的事情完全取决于实现。 And yes, any decent implementation can easily turn tail-recursion into [an equivalent of] a cycle. 是的，任何体面的实现都可以轻松地将尾递归转换为[相当于]一个循环。 Of course, C compilers do not normally give any guarantees about what optimizations will and what optimizations will not happen in each particular piece of code. 当然，C编译器通常不会对优化程序以及每个特定代码段中不会发生的优化提供任何保证。 You have to compile it and see for yourself. 你必须编译它并亲自看看。

Although modern compilers MAY do tail-call optimization if you turn on optimizations, your debug builds will probably run without it so that you can get stack traces and step in/out of code and wonderful things like that. 虽然现代编译器可以在启用优化时进行尾部调用优化，但是调试版本可能会在没有它的情况下运行，因此您可以获得堆栈跟踪和进入/退出代码以及类似的奇妙事情。 In this situation, tail call optimization is not desired. 在这种情况下，不希望尾调用优化。

Since tail call optimization is not always desirable, it doesn't make sense to mandate it to compiler writers. 由于尾调用优化并不总是令人满意，因此将其强制命令编译器编写器是没有意义的。

I think that tail call optimizations need to be guaranteed only where a lot of recursion is anticipated or required; 我认为只有在预期或需要大量递归的情况下才需要保证尾调用优化; that is, in languages that encourage or enforce a functional programming style. 也就是说，在鼓励或强制执行函数式编程风格的语言中。 (With these kinds of languages, you may find that for or while loops are either strongly discouraged, perceived as inelegant, or probably even completely absent from the language, so you would resort to recursion for all these reasons, and probably more.) （使用这些类型的语言，你可能会发现for或while循环要么被强烈劝阻，要么被认为是不优雅的，或者甚至可能完全不在语言中，所以你会因为所有这些原因而诉诸递归，而且可能更多。）

The C programming language (IMHO) clearly was not designed with functional programming in mind. C编程语言（恕我直言）显然没有考虑到函数式编程。 There's all kinds of loop constructs that are generally used in favour of recursion: for , do .. while , while . 有各种各样的循环结构通常用于支持递归： for ， do .. while ， while 。 In such a language, it wouldn't make much sense to prescribe tail call optimization in a standard, because it's not strictly required to guarantee working programs. 在这种语言中，在标准中规定尾部呼叫优化没有多大意义，因为并不是严格要求保证工作程序。

Contrast this again with a functional programming language that doesn't have while loops: This means you will need recursion; 将此与一个没有while循环的函数式编程语言进行对比：这意味着你需要递归; which in turn means that the language must make sure that, with many iterations, stack overflows won't become a problem; 这反过来意味着语言必须确保，经过多次迭代，堆栈溢出不会成为问题; thus the official standard for such a language might choose to prescribe tail call optimization. 因此，这种语言的官方标准可能会选择规定尾部呼叫优化。

PS: Note a slight flaw in my argument for tail call optimization. PS：请注意我的尾部调用优化参数中的一个小漏洞。 Towards the end of, I mention stack overflows. 接近尾声，我提到了堆栈溢出。 But who says that function calls always require a stack? 但谁说函数调用总是需要堆栈？ On some platforms, function calls might be implemented in a totally different way, and stack overflows would never even be a problem. 在某些平台上，函数调用可能以完全不同的方式实现，堆栈溢出甚至不会成为问题。 This would be yet another argument against prescribing tail call optimization in a standard. 这将是另一个反对在标准中规定尾调用优化的论据。 (But don't get me wrong, I can see the merits of such optimizations, even without a stack!) （但不要误解我的意思，即使没有堆叠，我也可以看到这种优化的优点！）

To answer you last question: The standard should definitely not make any statements about optimization. 为了回答你的上一个问题：标准绝对不应该做任何关于优化的陈述。 There may be environments where it is more or less difficult to implement. 可能存在或多或少难以实施的环境。

The language standard defines how the language behaves, not how compilers are required to be implemented. 语言标准定义了语言的行为方式，而不是如何实现编译器。 Optimization isn't mandated because it isn't always wanted. 优化不是强制性的，因为并不总是需要。 Compilers provide options so that the user can enable optimizations if they so desire them, and can likewise turn them off. 编译器提供选项，以便用户可以根据需要启用优化，并且同样可以将其关闭。 Compiler optimization can affect the ability to debug code (it becomes harder to match C to assembly in a line-by-line fashion), so it makes sense to only perform optimization at the user's request. 编译器优化会影响调试代码的能力（以逐行方式将C与汇编匹配变得更加困难），因此仅根据用户的请求执行优化是有意义的。

There are situations, where tail call optimisation would potentially break the ABI or at least be very difficult to implement in a semantic-preserving way. 在某些情况下，尾调用优化可能会破坏ABI，或者至少很难以语义保留的方式实现。 Think of position independent code in shared libraries for instance: Some platforms allow programs to link dynamically against libraries in order to save main memory when various different applications all depend on the same functionality. 例如，可以考虑共享库中与位置无关的代码：某些平台允许程序动态链接到库，以便在各种不同的应用程序都依赖于相同的功能时保存主内存。 In such cases, the library is loaded once and mapped into each of the program's virtual memory as if it was the only application on a system. 在这种情况下，库被加载一次并映射到程序的每个虚拟内存中，就好像它是系统上的唯一应用程序一样。 On UNIX and also on some other systems, this is achieved by using position independent code for libraries, so that addressing is relative to an offset, rather than absolute to a fixed address space. 在UNIX以及其他一些系统上，这是通过对库使用位置无关代码来实现的，因此寻址是相对于偏移而不是绝对的固定地址空间。 On many platforms, however, position independent code must not be tail call optimised. 但是，在许多平台上，位置无关代码不能进行尾调用优化。 The problem involved is that the offsets for navigating through the program have to be kept in registers; 所涉及的问题是导航程序的偏移量必须保存在寄存器中; on Intel 32-bit, %ebx is used which is a callee saved register; 在Intel 32位上，使用%ebx ，这是被调用者保存的寄存器; other platforms follow that notion. 其他平台遵循这一概念。 Unlike functions using normal calls, those deploying tail calls have to restore the callee saved registers before branching off to the subroutine, not when they return themselves. 与使用普通调用的函数不同，那些部署尾调用的函数必须在分支到子例程之前恢复被调用者保存的寄存器，而不是在它们自行返回时。 Normally, that is no problem, because at this point, the top most calling function does not care for the value stored in %ebx , but the position independent code depends on this value upon each and every jump, call or branch command. 通常，这没有问题，因为此时，最顶层的调用函数不关心存储在%ebx的值，但是位置无关代码依赖于每个跳转，调用或分支命令的该值。

Other problems could be pending clean-ups in object-oriented languages (C++), meaning that the last call in a function isn't actually the last call - the clean-ups are. 其他问题可能是等待面向对象语言（C ++）中的清理，这意味着函数中的最后一次调用实际上并不是最后一次调用 - 清理工作。 Hence, the compiler usually does not optimise, when this is the case. 因此，在这种情况下，编译器通常不会进行优化。

Also setjmp and longjmp are problematic, of course, since this effectively means a function can finish execution more than once, before it actually finishes. 当然， setjmp和longjmp也是有问题的，因为这实际上意味着函数可以在实际完成之前多次完成执行。 Difficult or impossible to optimise at compile time! 在编译时很难或不可能优化！

There's probably more technical reasons one can think of. 人们可以想到的技术原因可能更多。 These are just some considerations. 这些只是一些考虑因素。

For those who like proof by construction, here is godbolt doing a nice tail call optimisation and inline: https://godbolt.org/z/DMleUN 对于那些喜欢通过构造证明的人来说，这里是Godbolt做一个很好的尾部调用优化和内联： https ：//godbolt.org/z/DMleUN

However, if you crank the optimization to -O3 (or no doubt if you wait a few years or use a different compiler), the optimisation totally removes the loop/recursion. 但是，如果您将优化调整为-O3（或者如果您等待几年或使用不同的编译器，则毫无疑问），优化将完全消除循环/递归。

Here is an example that optimizes down to a single instruction even with -O2: https://godbolt.org/z/CNzWex 这是一个示例，即使使用-O2也可以优化为单个指令： https ： //godbolt.org/z/CNzWex

It is common for compilers to recognize situations where a function won't need to do anything after calling another, and replace that call with a jump. 编译器通常会在调用另一个函数后识别函数不需要执行任何操作的情况，并用跳转替换该调用。 Many cases where that can be done safely are easy to recognize, and such cases qualify as "safe low-hanging fruit". 许多可以安全地进行安全检查的案例很容易识别，而且这类案件有资格成为“安全低悬的果实”。 Even on compilers that can perform such optimization, however, it may not always be obvious when it should or will be performed. 然而，即使在可以执行此类优化的编译器上，它应该或将要执行时也可能并不总是显而易见的。 Various factors may make the cost of a tail call greater than that of a normal call, and these factors may not always be predictable. 各种因素可能使尾部呼叫的成本大于正常呼叫的成本，并且这些因素可能并不总是可预测的。 For example, if a function ends with return foo(1,2,3,a,b,c,4,5,6); 例如，如果函数以return foo(1,2,3,a,b,c,4,5,6);结尾return foo(1,2,3,a,b,c,4,5,6); , it may be practical to copy a, b, and c into registers, clean up the stack and then prepare the arguments for passing, but there may not be enough registers available to handle foo(a,b,c,d,e,f,g,h,i); ，将a，b和c复制到寄存器中，清理堆栈然后准备传递参数可能是切实可行的，但可能没有足够的寄存器来处理foo(a,b,c,d,e,f,g,h,i); likewise. 同样。

If a language had a special "tail call" syntax that required that compilers given that make a tail call if at all possible, and refuse compilation otherwise, code could safely assume such functions could be nested arbitrarily deep. 如果一种语言有一个特殊的“尾调用”语法，要求给出的编译器尽可能进行尾调用，否则拒绝编译，代码可以安全地假设这些函数可以任意嵌套。 When using ordinary call syntax, however, there's no general way to know whether a compiler would be able to perform a tail call more cheaply than an "ordinary" one. 然而，当使用普通的调用语法时，没有一般的方法来知道编译器是否能够比“普通”调制器更便宜地执行尾调用。