简体   繁体   中英

Best way to insert OpenMP pragms in nested for loops

I would like to explain my problem statement with simple example[I guess this is the common problem that in image processing]. Let's say i have nested for loop ocde as below:

for(int bs=0;bs<2;bs++){
    for(int c=0;c<3;c++){
        for(int h=0;h<227;h++){
            for(int w=0;w<227;w++){
                //Element index calculation
                int eleIdx=bs*3*300*300+c*300*300+h*300+w;
                // Here arr is raw buffer
                arr[eleIdx]=exp(arr[eleIdx])/(1+exp(arr[eleIdx]))
            }
        }
    }
}

What are the best options to parallelize the above code? Below are the options that I'm thinking of?

  1. Adding #pragma omp parallel for collapse(4) on the outer bs index or loop .
  2. Adding #pragma omp parallel for collapse(2) on the inner h index for loop .

Which one is better? Please let me know the reason behind that.

What are the best options to parallelize the above code? Below are the options that I'm thinking of?

Adding #pragma omp parallel for collapse(4) on the outer bs index or loop. Adding #pragma omp parallel for collapse(2) on the inner h index for loop.

As @Gilles already point out that depends on a lot of factors. For instance, the collapse clause adds additional computation compared with a non collapse one, because of the more complicated heuristic to distribute the iterations among threads. Moreover, the higher it is the level of the collapse the higher will be the overhead. But has always profiling is the answer.

Ideally you should follow the advice of the answer posted by @Gillies . However if that is not possible what you can do is to getting rip of the first two loops, apply loop unrolling and then use #pragma omp parallel for or #pragma omp parallel for collapse(2) , whatever yield the best results. Or simply swap the loops, so that the ones with less iterations are the most inner ones:

An example of such approach:

#pragma omp parallel for collapse(2)
for(int h=0;h<227;h++){
    for(int w=0;w<227;w++){
        for(int bs=0;bs<2;bs++){
            for(int c=0;c<3;c++){
                int eleIdx=bs*3*300*300+c*300*300+h*300+w;
                arr[eleIdx]=exp(arr[eleIdx])/(1+exp(arr[eleIdx]))
            }
        }
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM