g ++ c ++ 11 constexpr評估性能

Question

g ++（4.7.2）和類似版本似乎在編譯期間以驚人的速度評估constexpr。 事實上在我的機器上比運行時編譯的程序快得多。

這種行為有合理的解釋嗎？ 是否存在僅在編譯時適用的優化技術，可以比實際編譯的代碼更快地執行？ 如果是這樣，哪個？

這是我的測試程序和觀察結果。

#include <iostream>

constexpr int mc91(int n)
 {

     return (n > 100)? n-10 : mc91(mc91(n+11));

 }

constexpr double foo(double n)
{
   return (n>2)? (0.9999)*((unsigned int)(foo(n-1)+foo(n-2))%100):1;
}

constexpr unsigned ack( unsigned m, unsigned n )
{
    return m == 0
        ? n + 1
        : n == 0
        ? ack( m - 1, 1 )
        : ack( m - 1, ack( m, n - 1 ) );
}

constexpr unsigned slow91(int n) {
   return mc91(mc91(foo(n))%100);
}

int main(void)
{
   constexpr unsigned int compiletime_ack=ack(3,14);
   constexpr int compiletime_91=slow91(49);
   static_assert( compiletime_ack == 131069, "Must be evaluated at compile-time" );
   static_assert( compiletime_91  == 91,     "Must be evaluated at compile-time" );
   std::cout << compiletime_ack << std::endl;
   std::cout << compiletime_91  << std::endl;
   std::cout << ack(3,14) << std::endl;
   std::cout << slow91(49) << std::endl;
   return 0;
}

編譯時：

time g++ constexpr.cpp -std=c++11 -fconstexpr-depth=10000000 -O3 

real    0m0.645s
user    0m0.600s
sys     0m0.032s

運行：

time ./a.out 

131069
91
131069
91

real    0m43.708s
user    0m43.567s
sys     0m0.008s

這里mc91是通常的mac carthy f91（可以在wikipedia上找到），而foo只是一個無用的函數，返回大約1到100之間的實際值，具有fib運行時復雜性。

編譯器和編譯程序使用相同的參數來評估91的慢速計算和ackermann函數。

令人驚訝的是，程序甚至可以運行得更快，只需生成代碼並通過編譯器運行它，而不是執行代碼本身。

Answer 1

在編譯時，可以記憶冗余（相同） constexpr調用，而運行時遞歸行為不提供此功能。

如果你改變每個遞歸函數，如...

constexpr unsigned slow91(int n) {
   return mc91(mc91(foo(n))%100);
}

......到是不是一個形式constexpr ，但記得在運行時過去算了一筆賬：

std::unordered_map< int, boost::optional<unsigned> > results4;
//     parameter(s) ^^^           result ^^^^^^^^

unsigned slow91(int n) {
     boost::optional<unsigned> &ret = results4[n];
     if ( !ret )
     {
         ret = mc91(mc91(foo(n))%100);
     }
     return *ret;
}

你會得到不那么令人驚訝的結果。

編譯時：

time g++ test.cpp -std=c++11 -O3

real    0m1.708s
user    0m1.496s
sys     0m0.176s

運行：

time ./a.out

131069
91
131069
91

real    0m0.097s
user    0m0.064s
sys     0m0.032s

Answer 2

記憶化

這是一個非常有趣的“發現”，但答案可能比你想象的更簡單。

東西可以評價編譯時聲明時constexpr如果所涉及在編譯時是已知的所有的值（如果在那里將數值應該結束該變量被聲明constexpr以及）與所述想象以下偽代碼：

f(x)   = g(x)
g(x)   = x + h(x,x)
h(x,y) = x + y

因為每個值在編譯時都是已知的，所以編譯器可以將上面的內容重寫為等效的，如下：

f(x) = x + x + x

換句話說，每個函數調用都已被刪除並替換為表達式本身的函數調用。 同樣適用的是一種稱為memoization的方法，其中存儲的計算表達式的結果被存儲起來，因此您只需要進行一次艱苦的工作。

如果你知道g(5) = 15為什么要再計算一次？ 而只需在每次需要時將g(5)替換為15 ，這是可能的，因為聲明為constexpr的函數不允許有副作用 。

運行

在運行時，這沒有發生（因為我們沒有告訴代碼以這種方式運行）。 通過您的代碼運行的小家伙將需要跳到f到g於h ，然后跳回g從h它跳躍從之前g到f所有，而他存儲每個函數的返回值，將它傳遞給下一個。

即使這個人非常小，並且他不需要跳得很遠，他仍然不喜歡一直來回跳躍，他需要做很多事情並且這樣做; 這需要時間。

**但是在OPs的例子中，是否真的計算了編譯時間？**

是的，對於那些不相信編譯器實際計算並將其作為常量二進制的常量的人，我將從下面的OP代碼提供相關的匯編指令（ g++ -S -Wall -pedantic -fconstexpr-depth=1000000 -std=c++11 ）

main:
.LFB1200:
  .cfi_startproc
  pushq %rbp
  .cfi_def_cfa_offset 16
  .cfi_offset 6, -16
  movq  %rsp, %rbp
  .cfi_def_cfa_register 6
  subq  $16, %rsp
  movl  $131069, -4(%rbp)
  movl  $91, -8(%rbp)
  movl  $131069, %esi               # one of the values from constexpr
  movl  $_ZSt4cout, %edi
  call  _ZNSolsEj
  movl  $_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_, %esi
  movq  %rax, %rdi
  call  _ZNSolsEPFRSoS_E
  movl  $91, %esi                   # the other value from our constexpr
  movl  $_ZSt4cout, %edi
  call  _ZNSolsEi
  movl  $_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_, %esi
  movq  %rax, %rdi

  # ...
  # a lot of jumping is taking place down here
  # see the full output at http://codepad.org/Q8D7c41y

g ++ c ++ 11 constexpr評估性能

問題描述

2 個解決方案

解決方案1
14 2013-04-25 18:29:25

解決方案2
9 2013-04-25 18:49:29

記憶化

運行

**但是在OPs的例子中，是否真的計算了編譯時間？**

g ++ c ++ 11 constexpr評估性能

問題描述

2 個解決方案

解決方案1 14 2013-04-25 18:29:25

解決方案2 9 2013-04-25 18:49:29

記憶化

運行

但是在OPs的例子中，是否真的計算了編譯時間 ？

解決方案1
14 2013-04-25 18:29:25

解決方案2
9 2013-04-25 18:49:29

**但是在OPs的例子中，是否真的計算了編譯時間？**