C ++编译器如何优化堆栈分配？

Question

I find this post and write some tests like this: 我找到了这篇文章，并编写了一些类似的测试：

I am expecting compiler makes a TCO on foo3 , that destroys sp first and invokes func with a simple jump that would not create stack frame. 我期望编译器在foo3上产生TCO，该TCO首先破坏sp并以简单的跳转调用func ，而不会创建堆栈框架。 But it is not happening. 但这没有发生。 The program runs into func at (assembly code) line 47 with a call and clean sp object after that. 程序在（汇编代码）第47行运行到func中，然后进行call并清理sp对象。 The optimization will not happen even I clear ~Simple() . 即使我清除~Simple() ，优化也不会发生。

So, how can I trigger TCO in this case? 那么，在这种情况下如何触发TCO？

Answer 1

First, note that the example has a double-free bug. 首先，请注意该示例有一个双重错误。 If the move-constructor is called, sp.buffer is not set to nullptr as it must be, so two pointers to the buffer now exist to be later deleted. 如果调用了移动构造函数，则sp.buffer不会像必须那样设置为nullptr ，因此现在存在指向该缓冲区的两个指针，以便以后删除。 A simpler version which manages the pointer correctly is: 正确管理指针的一个简单版本是：

struct Simple {
  std::unique_ptr<int[]> buffer {new int[1000]};
};

With that fix, let's inline almost everything and see what foo3 really does in all its glory: 有了这个修复程序，让我们内联几乎所有内容，看看foo3在其所有荣耀中的真正作用：

using func_t = std::function<int(Sample&&)>&&;
int foo3(func_t func) {
  int* buffer1 = new int[1000]; // the unused local
  int* buffer2 = new int[1000]; // the call argument
  if (!func) {
    delete[] buffer2;
    delete[] buffer1;
    throw bad_function_call;
  }
  try {
    int retval = func(buffer2); // <-- the call
  } catch (...) {
    delete[] buffer2;
    delete[] buffer1;
    throw;
  }
  delete[] buffer2;
  delete[] buffer1;
  return retval;              // <-- the return
}

The case of buffer1 is straightforward. buffer1的情况很简单。 It is an unused local, and the only side effects are allocation and deallocation, which compilers are allowed to skip. 它是一个未使用的局部变量，唯一的副作用是分配和释放，编译器可以跳过这些分配和释放。 An intelligent enough compiler could completely remove the unused local. 足够聪明的编译器可以完全删除未使用的本地。 clang++ 5.0 seems to accomplish this, but g++ 7.2 does not. clang ++ 5.0似乎可以做到这一点，但是g ++ 7.2却不能。

More interesting is buffer2 . 更有趣的是buffer2 。 func takes a non-const rvalue reference. func采用非常量右值引用。 It can modify the argument. 它可以修改参数。 For example, it may move from it. 例如，它可能会偏离它。 But it may not. 但事实并非如此。 The temporary may still own a buffer which must be deleted after the call and foo3 must do that. 临时对象可能仍然拥有缓冲区，在调用之后必须删除该缓冲区，并且foo3必须这样做。 The call is not a tail call. 该电话不是尾部电话。

As observed, we get closer to a tail call by simply leaking the buffer: 正如观察到的，我们通过简单地泄漏缓冲区来更接近尾部调用：

struct Simple {
    int* buffer = new int[1000];
};

That's cheating a bit because a big part of the question is about tail call optimization in the face of nontrivial destructors. 这有点作弊，因为问题的很大一部分是面对非平凡的析构函数时的尾调用优化。 But let's entertain this. 但是，让我们娱乐一下。 As observed, this alone does not result in a tail call. 正如观察到的那样，仅此一项就不会导致尾声。

To start with, note that passing by reference is a fancy form of passing by pointer. 首先，请注意按引用传递是指针传递的一种奇特形式。 The object still must exist somewhere, and that's on the stack in the caller. 该对象仍然必须存在于某个位置，并且位于调用方的堆栈中。 Needing to keep the caller's stack alive and nonempty during the call will rule out tail call optimization. 通话期间需要保持调用者的堆栈有效且非空，这将排除尾部调用优化。

To enable a tail call, we want to pass func 's arguments in registers, so it doesn't have to live in foo3 's stack. 要启用尾部调用，我们想在寄存器中传递func的参数，因此它不必存在于foo3的堆栈中。 This suggests we should pass by value: 这表明我们应该按值传递：

int foo2(Simple); // etc.

The SysV ABI dictates that to be passed in a register, it needs to be trivially copyable, movable, and destructible. SysV ABI指示要在寄存器中传递，它必须是可复制，可移动和可破坏的。 Being a struct wrapping an int* , we have that covered. 作为包装int*的结构，我们已经解决了。 Fun fact: we can't use a std::unique_ptr here with a no-op deleter because that is not trivially destructible. 有趣的事实：我们在这里不能将std::unique_ptr与无操作删除器一起使用，因为这是不容易被破坏的。

Even so, we still don't see a tail call. 即使这样，我们仍然看不到尾声。 I do not see a reason preventing it, but I'm not an expert. 我看不出有阻止它的理由，但我不是专家。 Replacing the std::function with a function pointer does result in a tail call. 用函数指针替换std::function确实会导致尾部调用。 The std::function has one extra argument in the call and has a conditional throw. std::function在调用中有一个额外的参数，并且有条件地抛出。 Is it possible those make it difficult enough to optimize? 这些可能使优化变得困难吗？

Anyways, with a function pointer, g++ 7.2 and clang++ 5.0 do the tail call: 无论如何，使用函数指针，g ++ 7.2和clang ++ 5.0进行尾调用：

struct Simple {
  int* buffer = new int[1000];
};

int foo2(Simple sp) {
  return sp.buffer[std::rand()];
}

using func_t = int (*)(Simple);
int foo3(func_t func) {
  return func(Simple());
}

But this is leaky. 但这是泄漏的。 Can we do better? 我们可以做得更好吗？ There is ownership underlying this type, and we want to pass it from foo3 to func . 此类型具有所有权，我们希望将其从foo3传递给func 。 But types with nontrivial destructors cannot be passed in arguments. 但是带有非平凡析构函数的类型不能在参数中传递。 This means an RAII type like std::unique_ptr will not get us there. 这意味着像std::unique_ptr这样的RAII类型将无法使我们到达那里。 Using a concept from the GSL, we can at least express the ownership: 使用GSL的概念，我们至少可以表达所有权：

template<class T> using owner = T;
struct Simple {
  owner<int*> buffer = new int[1000];
};

Then we can hope that static analysis tools now or in the future can detect that foo2 is accepting ownership but never deleting buffer . 然后我们可以希望现在或将来的静态分析工具可以检测到foo2正在接受所有权，但从未删除过buffer 。

C ++编译器如何优化堆栈分配？

问题描述

1 个解决方案

解决方案1
1 2017-10-29 22:37:37

C ++编译器如何优化堆栈分配？

问题描述

1 个解决方案

解决方案1 1 2017-10-29 22:37:37

解决方案1
1 2017-10-29 22:37:37