简体   繁体   English

C ++优化if / else语句

[英]C++ Optimize if/else statement

I wrote the following code to simulate and simplify what is happening in my application. 我编写了以下代码来模拟和简化应用程序中发生的事情。

In this simplification, I have the if and else branch that are executing the same code, but writing in a different portion of memory. 在这种简化中,我有ifelse分支,它们执行相同的代码,但写入内存的不同部分。 So I thought to use an array of two entries, and depending the statement condition the first or the second entry is updated. 因此,我考虑使用两个条目的数组,并根据语句条件更新第一条或第二条条目。
This solution brings the expected speed up. 该解决方案可以提高预期的速度。

But when during the execution we have a random access for each iteration, the improvement almost disappears. 但是,在执行过程中,每次迭代都有一个随机访问权限时 ,改进几乎消失了。 To show this bizarre behavior, I have used the template to activate or deactivate the use of if-statement and random access, ie. 为了显示这种奇怪的行为,我使用了模板来激活或停用if语句和随机访问(即)的使用。
useif : true when the if-statement is used, false when the memory access is used. useif :使用if语句时为true,使用内存访问时为false。
rand_access : true when we have a random access for each iteration, false otherwise. rand_access :对每个迭代都有随机访问权时为 true,否则为false。

#include <chrono>
#include <iostream>
#include <vector>
#include <numeric>
#include <algorithm>
#define N 1000000000
using namespace std;
using namespace std::chrono;

template <bool useif, bool rand_access>
void exec(vector<int>& V, vector<bool>& B) {
    auto start = high_resolution_clock::now();
    int sum[2], sum1 = 0, sum2 = 0;
    sum[0] = 0; sum[1] = 0;
    for ( int i = 0; i < N; i++ ) {
        const int index = (rand_access) ? V[i] : i;
        if ( useif ){
            if ( B[index] ) sum2 += V[i];
            else sum1 += V[i];
        } else
            sum[B[index]] += V[i];
    }
    auto t = std::chrono::duration_cast<milliseconds>(high_resolution_clock::now() - start);
    std::cout << "Time useif="<<useif<<", rand_access="<<rand_access<<" : " << t.count() << " ms" << std::endl;
    std::cout << (sum1+sum2+sum[0]+sum[1]) << std::endl;
}

int main() {
    vector<int> V(N);
    vector<bool> B(N, false);
    iota( V.begin(), V.end(), 0 );
    random_shuffle( V.begin(), V.end() );
    fill( B.begin(), B.begin() + B.size()/2, true);
    random_shuffle( B.begin(), B.end() );
    exec<false, false>(V, B);
    exec<false, true>(V, B);
    exec<true, false>(V, B);
    exec<true, true>(V, B); 
    return 0;
}

On my machine, compiling with g++ --std=c++11 -O3 -march=native -mtune=native, I obtain the following results: 在我的机器上,使用g ++ --std = c ++ 11 -O3 -march = native -mtune = native进行编译,我得到以下结果:
Time useif=0, rand_access=0 : 1518 ms 时间useif = 0,rand_access = 0: 1518 ms
Time useif=0, rand_access=1 : 10791 ms 时间useif = 0,rand_access = 1: 10791 ms
Time useif=1, rand_access=0 : 4384 ms 时间使用if = 1,rand_access = 0: 4384 ms
Time useif=1, rand_access=1 : 12214 ms 使用时间= 1if,rand_access = 1: 12214 ms

So, there is a speed up of 2.8 substituting the if-statement with a memory access, when there is NOT a random access involved, otherwise the performance are really close ( 1.1 speed up). 因此,在涉及随机访问的情况下,将if语句替换为内存访问的速度提高了2.8 ,否则性能确实接近( 1.1速度提高)。

I don't understand why this is happening and how can I deal with it, ie how can I optimize the if-statement knowing that the if and else branch are executing the same code? 我不明白为什么会发生这种情况,以及如何处理它,即知道ifelse分支正在执行同一代码时如何优化 if语句?

Your optimization of if / else with an array is the correct one. 您对if / else用数组的优化是正确的。 It always gives you an improvement, but the significance of the improvement depends on other factors as well. 它总是可以为您带来改善,但是改善的意义也取决于其他因素。

Your experiment shows relative impacts of branch elimination and cache access optimization. 您的实验显示了消除分支和优化缓存访问的相对影响。

When the code accesses memory in order, it takes advantage of CPU cache optimization due to locality of reference, "paying" for only a small fraction of its memory accesses. 当代码按顺序访问内存时,由于引用的位置原因,它利用了CPU高速缓存优化,仅为其内存访问的一小部分“付费”。 With 64-byte cache lines, it acts like an incredible "buy one, get fifteen free" policy for 4-byte integers stored in consecutive locations. 有了64字节的高速缓存行,它对于存储在连续位置中的4字节整数就像一个令人难以置信的“买一送十五免费”策略。 It lets your CPU keep on adding with very little wait for the data from memory. 它使您的CPU只需很少的等待就可以继续添加内存中的数据。

When the code has no branching, it takes advantage of CPU instruction pipeline. 当代码没有分支时,它将利用CPU指令流水线。 Hitting an if with a condition that is hard to predict stalls the pipeline, so fewer instructions are "in flight" at the same time. 在难以预测的情况下命中if会使管道停滞,因此同时“在运行中”的指令较少。

Going from random access with branching to sequential access with branching saves you 7.8 seconds; 从具有分支的随机访问到具有分支的顺序访问可以节省7.8秒; eliminating branching saves you an additional 2.8 seconds on top of that. 消除分支可为您节省2.8秒的时间。

In contrast, eliminating branching without sequential access gives you only a 1.5 seconds improvement, because eliminating pipeline stalls becomes a lot less important when the CPU is waiting for memory anyway. 相比之下,消除分支而不进行顺序访问只会带来1.5秒的改进,因为在CPU等待内存时消除管道停顿变得不那么重要了。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM