简体   繁体   English

这是 OpenMP firstprivate 的正确使用吗?

[英]Is this the correct use of OpenMP firstprivate?

I need to parallelize the following:我需要并行化以下内容:

for(i=0; i<n/2; i++)
   a[i] = a[i+1] + a[2*i]

In parallel, the output will be different than in sequential, because the "to read" values will be "rewritten".并行输出将与顺序输出不同,因为“读取”值将被“重写”。 In order to get the sequential output, but then parallelized I want to make use of firstprivate(a).为了获得顺序输出,然后并行化,我想利用 firstprivate(a)。 Because firstprivate gives each tread a copy of a.因为 firstprivate 给了每个步骤一个副本。

Let's imagine 4 threads and a loop of 100.让我们想象一下 4 个线程和 100 个循环。

  • 1 --> i = 0 till 24 1 --> i = 0 到 24
  • 2 --> i = 25 till 49 2 --> i = 25 到 49
  • 3 --> i = 50 till 74 3 --> i = 50 到 74
  • 4 --> i =75 till 99 4 --> i = 75 到 99

That means that each tread will rewrite 25% of the array.这意味着每个步将重写阵列的 25%。

When the parallel region is over, all the threads "merge".当并行区域结束时,所有线程“合并”。 Does that mean that you get the same a as if you ran it in sequential?这是否意味着您得到的 a 与按顺序运行的结果相同?

#pragma omp parallel for firstprivate(a)
for(i=0; i<n/2; i++)
   a[i] = a[i+1] + a[2*i]

Question:题:

  • Is my way of thinking correct?我的思维方式正确吗?
  • Is the code parallelized in the right way to get the sequential output?代码是否以正确的方式并行化以获得顺序输出?

As you noted, using firstprivate to copy the data for each thread does not really help you getting the data back.正如您所指出的,使用firstprivate为每个线程复制数据并不能真正帮助您取回数据。

The easiest solution is in fact to separate input and output and have both be shared (the default).最简单的解决方案实际上是将输入和输出分开并共享(默认)。

In order to avoid a copy it would be good to just use the new variable instead of b from thereon in the code.为了避免复制,最好在代码中使用新变量而不是 b 。 Alternatively you could just have pointers and swap them.或者,您可以只使用指针并交换它们。

int out[100];
#pragma omp parallel for
for(i=0; i<n/2; i++)
   out[i] = a[i+1] + a[2*i]

// use out from here when you would have used a.

There is no easy and general way to have private copies of a for each thread and then merge them afterwards.有没有简单的和通用的方式有传抄a为每个线程,然后再把它们合并。 lastprivate just copies one incomplete output array from the thread executing the last iteration and reduction doesn't know which elements to take from which array. lastprivate只是复制从线程执行最后一次迭代和一个不完整的输出数组reduction不知道从哪个数组采取哪些元素。 Even if it was, it would be wasteful to copy the entire array for each thread.即使是这样,为每个线程复制整个数组也是一种浪费。 Having shared in-/outputs here is much more efficient.在这里共享输入/输出会更有效率。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM