简体   繁体   English

-O3循环增量优化

[英]-O3 loop increment optimization

I have this piece of code: 我有这段代码:

#include <iostream>
#include <thread>

long int global_variable;

struct process{
    long int loop_times_ = 0;
    bool op_;
    process(long int loop_times, bool op): loop_times_(loop_times), op_(op){}

    void run(){
        for(long int i=0; i<loop_times_; i++)
            if (op_) global_variable+=1;
            else global_variable-=1;
    }

};

int main(){
    struct process p1(10000000, true);
    struct process p2(10000000, false);

    std::thread t1(&process::run, p1);
    std::thread t2(&process::run, p2);
    t1.join();
    t2.join();

    std::cout <<global_variable<< std::endl;
    return 0;
}

Main function fires up two threads that increment and decrement a global variable. 主函数激活两个增加和减少全局变量的线程。 If i compile with this: 如果我用这个编译:

 g++ -std=c++11 -o main main.cpp -lpthread

i get different output in each execution. 我在每次执行中得到不同的输出。 But if i add -O3 and compile with this: 但是,如果我添加-O3并使用此编译:

g++ -O3 -std=c++11 -o main main.cpp -lpthread

the output is zero every time 每次输出为零

What kind of optimization is happening here that eliminates my critical section, and how can i trick the compiler to not optimize it? 这里发生了什么样的优化,消除了我的关键部分,如何欺骗编译器不优化它?

EDIT: OS: Ubuntu 16.04.4, g++: 5.4.0 编辑:操作系统:Ubuntu 16.04.4,g ++:5.4.0

It's very likely that your run method is being optimized to the equivalent of: 你的run方法很可能被优化到相当于:

 void run(){
      if (op_) global_variable += loop_times_;
            else global_variable -= loop_times_;

This is something the compiler can do quite easily with the information available. 这是编译器可以使用可用信息轻松完成的事情。

To trick the compiler, you have to make sure that it's not obvious that the loop will add or subtract 1 with no other side effects on every iteration. 为了欺骗编译器,你必须确保循环不会增加或减少1 而不会对每次迭代产生其他副作用

Try adding a function call into the loop, that just increments a simple counter on the object called totalIterationsDone , or some such. 尝试在循环中添加一个函数调用,只是在名为totalIterationsDone的对象上增加一个简单的计数器,或者其他一些。 This might force the compiler into actually executing the loop. 这可能会迫使编译器实际执行循环。 Passing in your loop variable as an argument might also force it to keep track of intermediate values of i . 将循环变量作为参数传入也可能会强制它跟踪i的中间值。

struct process{
    long int loop_times_ = 0;
    bool op_;
    long int _iterationsDone = 0;
    process(long int loop_times, bool op): loop_times_(loop_times), op_(op){}

    void run(){
        for(long int i=0; i<loop_times_; i++){
            if (op_) global_variable+=1;
            else global_variable-=1;
            Trick(i);
        }
    }

    void Trick(int i){
       _iterationsDone += 1;
    }    
};

Your program has undefined behaviour, in the form of a data race. 您的程序以数据竞争的形式具有未定义的行为。 Two threads accessing one variable without synchronisation is a data race and thus undefined. 访问一个变量而没有同步的两个线程是数据竞争,因此未定义。

The easiest way to remove the data race would be to make global_variable atomic: 删除数据竞争的最简单方法是使global_variable原子:

std::atomice<long int> global_variable;

No further changes should be necessary in the rest of the code. 其余代码中不需要进一步更改。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM