简体   繁体   中英

parallel programming in OpenMP

I have the following piece of code.

for (i = 0; i < n; ++i) {
  ++cnt[offset[i]];
}

where offset is an array of size n containing values in the range [0, m) and cnt is an array of size m initialized to 0. I use OpenMP to parallelize it as follows.

#pragma omp parallel for shared(cnt, offset) private(i)
for (i = 0; i < n; ++i) {
  ++cnt[offset[i]];
}

According to the discussion in this post , if offset[i1] == offset[i2] for i1 != i2 , the above piece of code may result in incorrect cnt . What can I do to avoid this?

This code:

#pragma omp parallel for shared(cnt, offset) private(i)
for (i = 0; i < n; ++i) {
  ++cnt[offset[i]];
}

contains a race-condition during the updates of the array cnt , to solve it you need to guarantee mutual exclusion of those updates. That can be achieved with (for instance) #pragma omp atomic update but as already pointed out in the comments:

However, this resolves just correctness and may be terribly inefficient due to heavy cache contention and synchronization needs (including false sharing). The only solution then is to have each thread its private copy of cnt and reduce these copies at the end.

The alternative solution is to have a private array per thread, and at end of the parallel region you perform the manual reduction of all those arrays into one. An example of such approach can be found here .

Fortunately, with OpenMP 4.5 you can reduce arrays using a dedicate pragma, namely:

#pragma omp parallel for reduction(+:cnt)

You can have look at this example on how to apply that feature.

Worth mentioning that regarding the reduction of arrays versus the atomic approach as kindly point out by @Jérôme Richard :

Note that this is fast only if the array is not huge (the atomic based solution could be faster in this specific case regarding the platform and if the values are not conflicting). So that is m << n.

As always profiling is the key;, Hence. you should test your code with aforementioned approaches to find out which one is the most efficient.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM