简体   繁体   English

OpenMP 中的并行编程

[英]parallel programming in OpenMP

I have the following piece of code.我有以下一段代码。

for (i = 0; i < n; ++i) {
  ++cnt[offset[i]];
}

where offset is an array of size n containing values in the range [0, m) and cnt is an array of size m initialized to 0. I use OpenMP to parallelize it as follows.其中offset是一个大小为n的数组,其中包含[0, m)范围内的值,而cnt是一个大小为m的数组,初始化为 0。我使用 OpenMP 对其进行并行化,如下所示。

#pragma omp parallel for shared(cnt, offset) private(i)
for (i = 0; i < n; ++i) {
  ++cnt[offset[i]];
}

According to the discussion in this post , if offset[i1] == offset[i2] for i1 != i2 , the above piece of code may result in incorrect cnt .根据这篇文章的讨论,如果offset[i1] == offset[i2] for i1 != i2 ,上面的代码可能会导致错误的cnt What can I do to avoid this?我能做些什么来避免这种情况?

This code:这段代码:

#pragma omp parallel for shared(cnt, offset) private(i)
for (i = 0; i < n; ++i) {
  ++cnt[offset[i]];
}

contains a race-condition during the updates of the array cnt , to solve it you need to guarantee mutual exclusion of those updates.在数组cnt更新期间包含竞争条件,要解决它,您需要保证这些更新的互斥。 That can be achieved with (for instance) #pragma omp atomic update but as already pointed out in the comments:这可以通过(例如) #pragma omp atomic update来实现,但正如评论中已经指出的那样:

However, this resolves just correctness and may be terribly inefficient due to heavy cache contention and synchronization needs (including false sharing).但是,这仅解决了正确性问题,并且由于大量的缓存争用和同步需求(包括错误共享),可能效率非常低。 The only solution then is to have each thread its private copy of cnt and reduce these copies at the end.唯一的解决方案是让每个线程拥有其私有的 cnt 副本,并在最后减少这些副本。

The alternative solution is to have a private array per thread, and at end of the parallel region you perform the manual reduction of all those arrays into one.另一种解决方案是每个线程都有一个私有数组,并在并行区域结束时手动将所有这些 arrays 减少为一个。 An example of such approach can be found here .可以在此处找到此类方法的示例。

Fortunately, with OpenMP 4.5 you can reduce arrays using a dedicate pragma, namely:幸运的是,使用OpenMP 4.5 ,您可以使用专用 pragma 减少 arrays,即:

#pragma omp parallel for reduction(+:cnt)

You can have look at this example on how to apply that feature.您可以查看此示例以了解如何应用该功能。

Worth mentioning that regarding the reduction of arrays versus the atomic approach as kindly point out by @Jérôme Richard :值得一提的是,关于减少 arrays@Jérôme Richard所指出的原子方法相比:

Note that this is fast only if the array is not huge (the atomic based solution could be faster in this specific case regarding the platform and if the values are not conflicting).请注意,仅当数组不是很大时,这才很快(在这种关于平台的特定情况下,如果值不冲突,基于原子的解决方案可能会更快)。 So that is m << n.所以这是 m << n。

As always profiling is the key;, Hence.一如既往,分析是关键;因此。 you should test your code with aforementioned approaches to find out which one is the most efficient.您应该使用上述方法测试您的代码,以找出最有效的方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM