g ++ c ++ 11 constexpr评估性能

Question

g++ (4.7.2) and similar versions seem to evaluate constexpr surprisingly fast during compile-time. g ++（4.7.2）和类似版本似乎在编译期间以惊人的速度评估constexpr。 On my machines in fact much faster than the compiled program during runtime. 事实上在我的机器上比运行时编译的程序快得多。

Is there a reasonable explanation for that behavior? 这种行为有合理的解释吗？ Are there optimization techniques involved which are only applicable at compile-time, that can be executed quicker than actual compiled code? 是否存在仅在编译时适用的优化技术，可以比实际编译的代码更快地执行？ If so, which? 如果是这样，哪个？

Here`s my test program and the observed results. 这是我的测试程序和观察结果。

#include <iostream>

constexpr int mc91(int n)
 {

     return (n > 100)? n-10 : mc91(mc91(n+11));

 }

constexpr double foo(double n)
{
   return (n>2)? (0.9999)*((unsigned int)(foo(n-1)+foo(n-2))%100):1;
}

constexpr unsigned ack( unsigned m, unsigned n )
{
    return m == 0
        ? n + 1
        : n == 0
        ? ack( m - 1, 1 )
        : ack( m - 1, ack( m, n - 1 ) );
}

constexpr unsigned slow91(int n) {
   return mc91(mc91(foo(n))%100);
}

int main(void)
{
   constexpr unsigned int compiletime_ack=ack(3,14);
   constexpr int compiletime_91=slow91(49);
   static_assert( compiletime_ack == 131069, "Must be evaluated at compile-time" );
   static_assert( compiletime_91  == 91,     "Must be evaluated at compile-time" );
   std::cout << compiletime_ack << std::endl;
   std::cout << compiletime_91  << std::endl;
   std::cout << ack(3,14) << std::endl;
   std::cout << slow91(49) << std::endl;
   return 0;
}

compiletime: 编译时：

time g++ constexpr.cpp -std=c++11 -fconstexpr-depth=10000000 -O3 

real    0m0.645s
user    0m0.600s
sys     0m0.032s

runtime: 运行：

time ./a.out 

131069
91
131069
91

real    0m43.708s
user    0m43.567s
sys     0m0.008s

Here mc91 is the usual mac carthy f91 (as can be found on wikipedia) and foo is just a useless function returning real values between about 1 and 100, with a fib runtime complexity. 这里mc91是通常的mac carthy f91（可以在wikipedia上找到），而foo只是一个无用的函数，返回大约1到100之间的实际值，具有fib运行时复杂性。

Both the slow calculation of 91 and the ackermann functions get evaluated with the same arguments by the compiler and the compiled program. 编译器和编译程序使用相同的参数来评估91的慢速计算和ackermann函数。

Surprisingly the program would even run faster, just generating code and running it through the compiler than executing the code itself. 令人惊讶的是，程序甚至可以运行得更快，只需生成代码并通过编译器运行它，而不是执行代码本身。

Answer 1

At compile-time, redundant (identical) constexpr calls can be memoized , while run-time recursive behavior does not provide this. 在编译时，可以记忆冗余（相同） constexpr调用，而运行时递归行为不提供此功能。

If you change every recursive function such as... 如果你改变每个递归函数，如...

constexpr unsigned slow91(int n) {
   return mc91(mc91(foo(n))%100);
}

... to a form that isn't constexpr , but does remember past calculations at runtime: ......到是不是一个形式constexpr ，但记得在运行时过去算了一笔账：

std::unordered_map< int, boost::optional<unsigned> > results4;
//     parameter(s) ^^^           result ^^^^^^^^

unsigned slow91(int n) {
     boost::optional<unsigned> &ret = results4[n];
     if ( !ret )
     {
         ret = mc91(mc91(foo(n))%100);
     }
     return *ret;
}

You will get less surprising results. 你会得到不那么令人惊讶的结果。

compiletime: 编译时：

time g++ test.cpp -std=c++11 -O3

real    0m1.708s
user    0m1.496s
sys     0m0.176s

runtime: 运行：

time ./a.out

131069
91
131069
91

real    0m0.097s
user    0m0.064s
sys     0m0.032s

Answer 2

Memoization 记忆化

This is a very interesting "discovery" but the answer is probably more simple than you think it is. 这是一个非常有趣的“发现”，但答案可能比你想象的更简单。

Something can be evaluated compile-time when declared constexpr if all values involved are known at compile time (and if the variable where the value is supposed to end up is declared constexpr as well) with that said imagine the following pseudo-code: 东西可以评价编译时声明时constexpr如果所涉及在编译时是已知的所有的值（如果在那里将数值应该结束该变量被声明constexpr以及）与所述想象以下伪代码：

f(x)   = g(x)
g(x)   = x + h(x,x)
h(x,y) = x + y

since every value is known at compile time the compiler can rewrite the above into the, equivalent, below: 因为每个值在编译时都是已知的，所以编译器可以将上面的内容重写为等效的，如下：

f(x) = x + x + x

To put it in words every function call has been removed and replaced with that of the expression itself. 换句话说，每个函数调用都已被删除并替换为表达式本身的函数调用。 What is also applicable is a method called memoization where results of passed calculated expresions are stored away so you only need to do the hard work once. 同样适用的是一种称为memoization的方法，其中存储的计算表达式的结果被存储起来，因此您只需要进行一次艰苦的工作。

If you know that g(5) = 15 why calculate it again? 如果你知道g(5) = 15为什么要再计算一次？ instead just replace g(5) with 15 everytime it is needed, This is possible since a function declared as constexpr isn't allowed to have side-effects . 而只需在每次需要时将g(5)替换为15 ，这是可能的，因为声明为constexpr的函数不允许有副作用 。

Runtime 运行

In runtime this is not happening (since we didn't tell the code to behave this way). 在运行时，这没有发生（因为我们没有告诉代码以这种方式运行）。 The little guy running through your code will need to jump from f to g to h and then jump back to g from h before it jumps from g to f all while he stores the return value of each function and passing it along to the next one. 通过您的代码运行的小家伙将需要跳到f到g于h ，然后跳回g从h它跳跃从之前g到f所有，而他存储每个函数的返回值，将它传递给下一个。

Even if this guy is very very tiny and that he doesn't need to jump very very far he still doesn't like jumping back and forth all the time, it takes a lot for him to do this and with that; 即使这个人非常小，并且他不需要跳得很远，他仍然不喜欢一直来回跳跃，他需要做很多事情并且这样做; it takes time. 这需要时间。

But in the OPs example, is it really calculated compile-time* ?* **但是在OPs的例子中，是否真的计算了编译时间？**

Yes, and to those not believing that the compiler actually calculates this and put it as constants in the finished binary I will supply the relevant assembly instructions from OPs code below (output of g++ -S -Wall -pedantic -fconstexpr-depth=1000000 -std=c++11 ) 是的，对于那些不相信编译器实际计算并将其作为常量二进制的常量的人，我将从下面的OP代码提供相关的汇编指令（ g++ -S -Wall -pedantic -fconstexpr-depth=1000000 -std=c++11 ）

main:
.LFB1200:
  .cfi_startproc
  pushq %rbp
  .cfi_def_cfa_offset 16
  .cfi_offset 6, -16
  movq  %rsp, %rbp
  .cfi_def_cfa_register 6
  subq  $16, %rsp
  movl  $131069, -4(%rbp)
  movl  $91, -8(%rbp)
  movl  $131069, %esi               # one of the values from constexpr
  movl  $_ZSt4cout, %edi
  call  _ZNSolsEj
  movl  $_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_, %esi
  movq  %rax, %rdi
  call  _ZNSolsEPFRSoS_E
  movl  $91, %esi                   # the other value from our constexpr
  movl  $_ZSt4cout, %edi
  call  _ZNSolsEi
  movl  $_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_, %esi
  movq  %rax, %rdi

  # ...
  # a lot of jumping is taking place down here
  # see the full output at http://codepad.org/Q8D7c41y

g ++ c ++ 11 constexpr评估性能

问题描述

2 个解决方案

解决方案1
14 2013-04-25 18:29:25

解决方案2
9 2013-04-25 18:49:29

Memoization 记忆化

Runtime 运行

But in the OPs example, is it really calculated compile-time* ?* **但是在OPs的例子中，是否真的计算了编译时间？**

g ++ c ++ 11 constexpr评估性能

问题描述

2 个解决方案

解决方案1 14 2013-04-25 18:29:25

解决方案2 9 2013-04-25 18:49:29

Memoization 记忆化

Runtime 运行

But in the OPs example, is it really calculated compile-time ? 但是在OPs的例子中，是否真的计算了编译时间 ？

解决方案1
14 2013-04-25 18:29:25

解决方案2
9 2013-04-25 18:49:29

But in the OPs example, is it really calculated compile-time* ?* **但是在OPs的例子中，是否真的计算了编译时间？**