对纯C ++函数进行基准测试

Question

How do I prevent GCC/Clang from inlining and optimizing out multiple invocations of a pure function? 如何防止GCC / Clang内联和优化纯函数的多次调用？

I am trying to benchmark code of this form 我正在尝试对这种形式的代码进行基准测试

int __attribute__ ((noinline)) my_loop(int const* array, int len) {
   // Use array to compute result.
 }

My benchmark code looks something like this: 我的基准代码看起来像这样：

int main() {
  const int number = 2048;
   // My own aligned_malloc implementation.
  int* input = (int*)aligned_malloc(sizeof(int) * number, 32);
  // Fill the array with some random numbers.
  make_random(input, number);
  const int num_runs = 10000000;
  for (int i = 0; i < num_runs; i++) {
     const int result = my_loop(input, number); // Call pure function.
  }
  // Since the program exits I don't free input.
}

As expected Clang seems to be able to turn this into a no-op at O2 (perhaps even at O1). 不出所料，Clang似乎可以在O2（甚至在O1）将其变为无操作。

A few things I tried to actually benchmark my implementation are: 我尝试实际对实现进行基准测试的几件事是：

Accumulate the intermediate results in an integer and print the results at the end: 将中间结果累加为整数，并在最后打印结果：
```
 const int num_runs = 10000000; uint64_t total = 0; for (int i = 0; i < num_runs; i++) { total += my_loop(input, number); // Call pure function. } printf("Total is %llu\\n", total); 
```
Sadly this doesn't seem to work. 不幸的是，这似乎不起作用。 Clang at least is smart enough to realize that this is a pure function and transforms the benchmark to something like this: Clang至少很聪明，足以意识到这是一个纯函数，并将基准转换为如下形式：
```
 int result = my_loop(); uint64_t total = num_runs * result; printf("Total is %llu\\n", total); 
```
Set an atomic variable using release semantics at the end of every loop iteration: 在每次循环迭代结束时，使用release语义设置一个原子变量：
```
 const int num_runs = 10000000; std::atomic<uint64_t> result_atomic(0); for (int i = 0; i < num_runs; i++) { int result = my_loop(input, number); // Call pure function. // Tried std::memory_order_release too. result_atomic.store(result, std::memory_order_seq_cst); } printf("Result is %llu\\n", result_atomic.load()); 
```
My hope was that since atomics introduce a happens-before relationship, Clang would be forced to execute my code. 我的希望是，因为原子引入了事前happens-before关系，所以Clang将被迫执行我的代码。 But sadly it still did the optimization above and sets the value of the atomic to num_runs * result in one shot instead of running num_runs iterations of the function. 但是可悲的是，它仍然进行了上述优化，并将atomic的值设置为num_runs * result一发子弹，而不是运行该函数的num_runs迭代。
Set a volatile int at the end of every loop along with summing the total. 在每个循环的末尾设置一个volatile int并求和。
```
 const int num_runs = 10000000; uint64_t total = 0; volatile int trigger = 0; for (int i = 0; i < num_runs; i++) { total += my_loop(input, number); // Call pure function. trigger = 1; } // If I take this printf out, Clang optimizes the code away again. printf("Total is %llu\\n", total); 
```
This seems to do the trick and my benchmarks seem to work. 这似乎可以解决问题，而我的基准测试似乎也有效。 This is not ideal for a number of reasons. 由于多种原因，这并不理想。
Per my understanding of the C++11 memory model volatile set operations do not establish a happens before relationship so I can't be sure that some compiler will not decide to do the same num_runs * result_of_1_run optimization . 根据我对C ++ 11内存模型的理解， volatile set operations不会happens before关系建立happens before因此我不能确定某些编译器不会决定执行相同的num_runs * result_of_1_run优化。
Also this method seems undesirable since now I have an overhead (however tiny) of setting a volatile int on every run of my loop. 同样，这种方法似乎是不可取的，因为现在我在每次循环运行时都要设置volatile int的开销（但是很小）。

Is there a canonical way of preventing Clang/GCC from optimizing this result away. 是否存在防止Clang / GCC优化此结果的规范方法。 Maybe with a pragma or something? 也许有某种实用性？ Bonus points if this ideal method works across compilers. 如果这种理想的方法可在所有编译器上使用，则可加分。

Answer 1

You can insert instruction directly into the assembly. 您可以将指令直接插入到程序集中。 I sometimes uses a macro for splitting up the assembly, eg separating loads from calculations and branching. 有时，我使用宏来拆分程序集，例如从计算和分支中分离负载。

#define GCC_SPLIT_BLOCK(str)  __asm__( "//\n\t// " str "\n\t//\n" );

Then in the source you insert 然后在源代码中插入

GCC_SPLIT_BLOCK("Keep this please") GCC_SPLIT_BLOCK（“请继续保存”）

before and after your functions 您的职能之前和之后

对纯C ++函数进行基准测试

问题描述

1 个解决方案

解决方案1
1 已采纳 2015-07-23 17:24:48

对纯C ++函数进行基准测试

问题描述

1 个解决方案

解决方案1 1 已采纳 2015-07-23 17:24:48

解决方案1
1 已采纳 2015-07-23 17:24:48