简体   繁体   English

OpenMP - 只创建一次线程

[英]OpenMP - create threads only once

I try to write simple application using OpenMP. 我尝试使用OpenMP编写简单的应用程序。 Unfortunately I have problem with speedup. 不幸的是我有加速问题。 In this application I have one while loop. 在这个应用程序中,我有一个while循环。 Body of this loop consists of some instructions which should be done sequentially and one for loop. 该循环的主体由一些指令组成,这些指令应该顺序完成,一个循环。 I use #pragma omp parallel for to make this for loop parallel. 我使用#pragma omp parallel for来使这个循环并行。 This loop doesn't have much work, but is called very often. 这个循环没有太多工作,但经常被调用。

I prepare two versions of for loop, and run application on 1, 2 and 4cores. 我准备两个版本的for循环,并在1,2和4个核心上运行应用程序。
version 1 (4 iterations in for loop): 22sec, 23sec, 26sec. 版本1(for循环中的4次迭代):22秒,23秒,26秒。
version 2 (100000 iterations in for loop): 20sec, 10sec, 6sec. 版本2(for循环100000次迭代):20秒,10秒,6秒。

As you can see, when for loop doesn't have much work, time on 2 and 4 cores is higher than on 1core. 正如您所看到的,当for循环没有太多工作时,2和4核心的时间高于1核心。 I guess the reason is that #pragma omp parallel for creates new threads in each iteration of while loop. 我想原因是#pragma omp parallel for在while循环的每次迭代中创建新线程。 So, I would like to ask you - is there any possibility to create threads once (before while loop), and ensure that some job in while loop will be done sequentially? 所以,我想问你 - 是否有可能创建一次线程(在while循环之前),并确保while循环中的某些作业将按顺序完成?

#include <omp.h>
#include <iostream>
#include <math.h>
#include <stdlib.h>
#include <stdio.h>
#include <time.h>
int main(int argc, char* argv[])
{
    double sum = 0;
    while (true)
    {
        // ...
        // some work which should be done sequentially
        // ...

        #pragma omp parallel for num_threads(atoi(argv[1])) reduction(+:sum)
        for(int j=0; j<4; ++j)  // version 2: for(int j=0; j<100000; ++j)
        {
            double x = pow(j, 3.0);
            x = sqrt(x);
            x = sin(x);
            x = cos(x);
            x = tan(x);
            sum += x;

            double y = pow(j, 3.0);
            y = sqrt(y);
            y = sin(y);
            y = cos(y);
            y = tan(y);
            sum += y;

            double z = pow(j, 3.0);
            z = sqrt(z);
            z = sin(z);
            z = cos(z);
            z = tan(z);
            sum += z;
        }

        if (sum > 100000000)
        {
            break;
        }
    }
    return 0;
}

Most OpenMP implementations create a number of threads on program startup and keep them for the duration of the program. 大多数OpenMP实现在程序启动时创建许多线程,并在程序的持续时间内保留它们。 That is, most implementations don't dynamically create and destroy threads during execution; 也就是说,大多数实现在执行期间不会动态创建和销毁线程; to do so would hit performance with severe thread management costs. 这样做会在严重的线程管理成本下达到性能。 This approach to thread management is consistent with, and appropriate for, the usual use cases for OpenMP. 这种线程管理方法与OpenMP的常用用例一致并且适用。

It is far more likely that the slowdown you see when you increase the number of OpenMP threads is down to imposing a parallel overhead on a loop with a tiny number of iterations. 当你增加OpenMP线程的数量时,你看到的减速更有可能是在一个循环上用很少的迭代强加并行开销。 Hristo's answer covers this. Hristo的答案涵盖了这一点。

You could move the parallel region outside of the while (true) loop and use the single directive to make the serial part of the code to execute in one thread only. 您可以将并行区域移动到while (true)循环之外,并使用single指令使代码的串行部分仅在一个线程中执行。 This will remove the overhead of the fork/join model. 这将消除fork / join模型的开销。 Also OpenMP is not really useful on thight loops with very small number of iterations (like your version 1). OpenMP在迭代次数很少的thight循环中也不是很有用(比如你的版本1)。 You are basically measuring the OpenMP overhead since the work inside the loop is done really fast - even 100000 iterations with transcendental functions take less than second on current generation CPU (at 2 GHz and roughly 100 cycles per FP instruciton other than addition, it'll take ~100 ms). 您基本上测量OpenMP开销,因为循环内部的工作非常快 - 甚至100000次迭代与超越函数在当前一代CPU上花费不到第二次(在2 GHz和每个FP指令大约100个循环,除了添加,它将会需要~100毫秒)。

That's why OpenMP provides the if(condition) clause that can be used to selectively turn off the parallelisation for small loops: 这就是OpenMP提供if(condition)子句的原因,该子句可用于有选择地关闭小循环的并行化:

#omp parallel for ... if(loopcnt > 10000)
for (i = 0; i < loopcnt; i++)
   ...

It is also advisable to use schedule(static) for regular loops (that is for loops in which every iteration takes about the same time to compute). 建议对常规循环使用schedule(static) (对于循环,其中每次迭代需要大约相同的时间来计算)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM