简体   繁体   English

openMP嵌套并行for循环vs内部并行for

[英]openMP nested parallel for loops vs inner parallel for

If I use nested parallel for loops like this: 如果我像这样使用嵌套并行for循环:

#pragma omp parallel for schedule(dynamic,1)
for (int x = 0; x < x_max; ++x) {
    #pragma omp parallel for schedule(dynamic,1)
    for (int y = 0; y < y_max; ++y) { 
    //parallelize this code here
   }
//IMPORTANT: no code in here
}

is this equivalent to: 这相当于:

for (int x = 0; x < x_max; ++x) {
    #pragma omp parallel for schedule(dynamic,1)
    for (int y = 0; y < y_max; ++y) { 
    //parallelize this code here
   }
//IMPORTANT: no code in here
}

Is the outer parallel for doing anything other than creating a new task? 除了创建新任务之外,做外部并行吗?

If your compiler supports OpenMP 3.0, you can use the collapse clause: 如果您的编译器支持OpenMP 3.0,则可以使用collapse子句:

#pragma omp parallel for schedule(dynamic,1) collapse(2)
for (int x = 0; x < x_max; ++x) {
    for (int y = 0; y < y_max; ++y) { 
    //parallelize this code here
    }
//IMPORTANT: no code in here
}

If it doesn't (eg only OpenMP 2.5 is supported), there is a simple workaround: 如果没有(例如只支持OpenMP 2.5),有一个简单的解决方法:

#pragma omp parallel for schedule(dynamic,1)
for (int xy = 0; xy < x_max*y_max; ++xy) {
    int x = xy / y_max;
    int y = xy % y_max;
    //parallelize this code here
}

You can enable nested parallelism with omp_set_nested(1); 您可以使用omp_set_nested(1);启用嵌套并行性omp_set_nested(1); and your nested omp parallel for code will work but that might not be the best idea. 你的嵌套omp parallel for代码将工作,但这可能不是最好的主意。

By the way, why the dynamic scheduling? 顺便问一下,为什么动态调度呢? Is every loop iteration evaluated in non-constant time? 是否在非恒定时间内评估每个循环迭代?

NO. 没有。

The first #pragma omp parallel will create a team of parallel threads and the second will then try to create for each of the original threads another team, ie a team of teams. 第一个#pragma omp parallel将创建一个并行线程团队,第二个将尝试为每个原始线程创建另一个团队,即团队团队。 However, on almost all existing implementations the second team has just only one thread: the second parallel region is essentially not used. 但是,在几乎所有现有实现中,第二个团队只有一个线程:第二个并行区域基本上没有使用。 Thus, your code is more like equivalent to 因此,您的代码更像是等同于

#pragma omp parallel for schedule(dynamic,1)
for (int x = 0; x < x_max; ++x) {
    // only one x per thread
    for (int y = 0; y < y_max; ++y) { 
        // code here: each thread loops all y
    }
}

If you don't want that, but only parallelise the inner loop, you can do this: 如果你不想这样,但只是内部循环,你可以这样做:

#pragma omp parallel
for (int x = 0; x < x_max; ++x) {
    // each thread loops over all x
#pragma omp for schedule(dynamic,1)
    for (int y = 0; y < y_max; ++y) { 
        // code here, only one y per thread
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM