OpenMP vs gcc编译器优化

Question

I'm learning openmp using the example of computing the value of pi via quadature. 我正在学习openmp，使用通过quadature计算pi值的例子。 In serial, I run the following C code: 在串行中，我运行以下C代码：

double serial() {
    double step;
    double x,pi,sum = 0.0;

    step = 1.0 / (double) num_steps;

    for (int i = 0; i < num_steps; i++) {
        x = (i + 0.5) * step; // forward quadature
        sum += 4.0 / (1.0 + x*x);
    }
    pi = step * sum;

    return pi;
}

I'm comparing this to an omp implementation using a parallel for with reduction: 我将这与使用并行减少的omp实现进行比较：

double SPMD_for_reduction() {
    double step;
    double pi,sum = 0.0;

    step = 1.0 / (double) num_steps;

    #pragma omp parallel for reduction (+:sum)
    for (int i = 0; i < num_steps; i++) {
        double x = (i + 0.5) * step;
        sum += 4.0 / (1.0 + x*x);
    }
    pi = step * sum;

    return pi;
}

For num_steps = 1,000,000,000, and 6 threads in the case of omp, I compile and time: 对于num_steps = 1,000,000,000，以及omp的6个线程，我编译和时间：

    double start_time = omp_get_wtime();
    serial();
    double end_time = omp_get_wtime();

    start_time = omp_get_wtime();
    SPMD_for_reduction();
    end_time = omp_get_wtime();

Using no cc compiler optimizations, the runtimes are around 4s (Serial) and .66s (omp). 不使用cc编译器优化，运行时间约为4s（串行）和.66s（omp）。 With the -O3 flag, serial runtime drops to ".000001s" and the omp runtime is mostly unchanged. 使用-O3标志，串行运行时将降至“.000001s”，并且omp运行时基本不变。 What's going on here? 这里发生了什么？ Is it vector instructions being used, or is it poor code or timing method? 是使用矢量指令，还是代码或定时方法不好？ If it's vectorization, why isn't the omp function benefiting? 如果它是矢量化，为什么omp函数不会受益？

It may be of interest that the machine I am using is using a modern 6 core Xeon processor. 我正在使用的机器使用的是现代的6核Xeon处理器。

Thanks! 谢谢！

Answer 1

The compiler outsmarts you. 编译器超出你的要求。 For the serial version it is able to detect, that the result of your computation is never used. 对于串行版本，它能够检测到从未使用过计算结果。 Therefore it throws out the computation completely. 因此它完全抛出了计算。

double start_time = omp_get_wtime();
serial(); //<-- Computations not used.
double end_time = omp_get_wtime();

In the openMP case the compiler can not see if really everything inside the function body is without an effect, so to stay on the safe side it keeps the function call. 在openMP的情况下，编译器无法看到函数体内部是否真的没有效果，所以为了保持安全，它会保持函数调用。

You can of course write something like double serial_pi = serial(); 你当然可以写一些像double serial_pi = serial(); and outside of the time measurement do some dummy stuff with the variable serial_pi . 并且在时间测量之外用变量serial_pi做一些虚拟的东西。 This way the compiler will keep the function call and do the optimizations you are actually looking for. 这样编译器将保持函数调用并执行您实际需要的优化。

OpenMP vs gcc编译器优化

问题描述

1 个解决方案

解决方案1
3 2017-02-06 15:39:47

OpenMP vs gcc编译器优化

问题描述

1 个解决方案

解决方案1 3 2017-02-06 15:39:47

解决方案1
3 2017-02-06 15:39:47