简体   繁体   English

在 openMP 中增加数组索引

[英]Increasing array index in openMP

I am new to using OpenMP.我刚开始使用 OpenMP。 I am trying to parallelize a nested loop, and so far I have something of this form...我正在尝试并行化一个嵌套循环,到目前为止我有这种形式的东西......

#pragma omp parallel for
for (j=0;j <m; j++) {
    some work;
    for (i= 0; i < n ; i++) {
        p =b[i];
        if (P< 0 && k < m) {
            a[k] = c[i]; k++ ;
        } else {
            x=c[i];
        }
    }
    some work
}

The outer loop is in parallel, and the inner loop updates k .外循环是并行的,内循环更新k The current value of k is needed for the other threads to update a[k] correctly.其他线程需要k的当前值才能正确更新a[k] The problem is that all of the threads are updating a[k] , but the proper order of k is not kept.问题是所有线程都在更新a[k] ,但是 k 的正确顺序没有保持。

Some threads will update k and a[k] , and some will not.有些线程会更新ka[k] ,有些则不会。 How do I communicate the latest k between threads to update a[k] properly, since c[i] will have different values for each thread?我如何在线程之间传递最新的k以正确更新a[k] ,因为c[i]对每个线程都有不同的值?

For example, when it runs serially, the program might set the first seven values of a to {1,3,5,7,3,9,13} and terminate with k equal to 7, but when done parallel, produces different results, or results in a different (therefore wrong) order.例如,当它串行运行时,程序可能会将a的前七个值设置为{1,3,5,7,3,9,13}并以k等于 7 终止,但是当并行运行时,会产生不同的结果,或导致不同的(因此是错误的)顺序。

How do I keep the same order and ensure parallelism at the same time?如何保持相同的顺序并同时确保并行性?

Note : this answer was completely rewritten in light of OP clarifications.注意:根据 OP 的说明,此答案已完全重写。 The original answer text is at the end.原答案在文末。

How do I keep the same order and ensure parallelism at the same time?如何保持相同的顺序并同时确保并行性?

Order dependency is antithetical to parallelism, as running operations in parallel inherently entails relaxing the relative order in which they are performed.顺序依赖与并行是对立的,因为并行运行操作本质上需要放宽执行它们的相对顺序。 Not all computations can be effectively parallelized.并非所有计算都可以有效地并行化。

Your case is not an exception.你的情况也不例外。 The second and each subsequent iteration of your outer loop needs to use the final value of k (among other things) computed by the previous iteration.外循环的第二次和每次后续迭代需要使用上一次迭代计算的k的最终值(除其他外)。 How can it get that?它怎么能得到那个? Only by performing the previous iteration first.只有先执行前一次迭代。 What room does that leave for concurrent operation?这为并发操作留下了什么空间? None.没有任何。 Concurrency is not the same thing as parallelism, but it is one of the main motivations for parallelism, because that's how parallelism yields improvements in elapsed time.并发性与并行性不同,但它是并行性的主要动机之一,因为这就是并行性在运行时间上产生改进的方式。

With no scope for concurrency, parallelism is actively counterproductive for you.由于没有 scope 用于并发,并行性对您来说会适得其反。 Suppose you made the whole body of the outer loop a critical section, so that there was no concurrency in fact (as your present code requires) and no data races involving k .假设您将外部循环的整个主体设为关键部分,因此实际上没有并发性(正如您当前的代码所要求的那样)并且没有涉及k的数据竞争。 Then you would still pay the overhead for parallelism, get no speedup in return, and probably still get the wrong results because of evaluations of the outer-loop body being performed in the wrong order.那么你仍然会为并行性付出开销,得不到加速作为回报,并且可能仍然会因为以错误的顺序执行外循环体的评估而得到错误的结果。

It may be that the whole thing can be rewritten to reduce or remove the data dependencies that prevent effective parallelization of the computation, or it may not.可能整个事情都可以重写以减少或删除阻止计算有效并行化的数据依赖性,也可能不会。 We haven't enough information to determine, as it depends in part on the details of " some work " and on the significance of the data.我们没有足够的信息来确定,因为它部分取决于“ some work ”的细节和数据的重要性。 Probably you would need an altogether different algorithm for producing the desired results.可能您需要一种完全不同的算法来产生所需的结果。

> Instead of giving a[n]={0,1,2,3,.......n}, it gives me garbage values for a when I use the reduction clause. > 而不是给 a[n]={0,1,2,3,.......n},当我使用 reduction 子句时,它给我 a 的垃圾值。 I need the total sum of K, hence the reduction clause. 我需要 K 的总和,因此需要减少条款。

There is a closed-form equation for the sum of consecutive integers, and it has especially simple form when the first integer in the list is 0 or 1. In particular, the sum of the integers from 0 to n , inclusive, is n * (n + 1) / 2 .连续整数之和有一个封闭式方程,当列表中的第一个 integer 为 0 或 1 时,它具有特别简单的形式。特别地,从 0 到n的整数之和为n * (n + 1) / 2 You do not need a reduction for this.你不需要为此减少。

If you wanted to use a reduction anyway, then you need to understand that it doesn't work the way you seem to think it does.如果您无论如何都想使用缩减,那么您需要了解它并不像您认为的那样工作。 What you get is a separate, private copy of the reduction variable for each thread executing the parallel construct, with the per thread (not per iteration) final values of those independant variables combined according to the reduction operator.您得到的是每个执行并行构造的线程的缩减变量的单独私有副本,这些独立变量的每个线程(而不是每次迭代)最终值根据缩减运算符组合。 Thus, if you really want to do the computation via an OpenMP reduction, then you would need to restructure the loop something like this:因此,如果您真的想通过 OpenMP 归约来进行计算,那么您需要像这样重构循环:

 #pragma omp parallel for reduction (+:k) for (i = 0; i < 10; i++) { a[i] = i; k += i; }

That assumes that the value of k is 0 immediately prior to the loop, as you indeed seem to be doing.这假设k的值在循环之前立即为 0,正如您确实在做的那样。 If that were not a safe assumption then you would need something like如果那不是一个安全的假设,那么您将需要类似

type_of_k k0 = k; k = 0; #pragma omp parallel for reduction (+:k) for (i = 0; i < 10; i++) { a[k0 + i] = i; k += k0 + i; }

Note that in either case, not only does that set up the reduction correctly, but it also breaks the data dependency between loop iterations that was previously carried by the expression k++ .请注意,在任何一种情况下,这不仅正确地设置了缩减,而且还打破了先前由表达式k++携带的循环迭代之间的数据依赖性。

It sounds like you're essentially filling in a with a filter of entries from c , and want to preserve their order.听起来您实际上是在使用来自c的条目过滤器来填充a ,并希望保留它们的顺序。 If this is the only use k has, some other methods spring to mind:如果这是k的唯一用途,请记住其他一些方法 spring:

  1. Always write a[i] , but use a mark indicating unused values where the P predicate wasn't satisfied.总是写a[i] ,但使用一个标记来指示不满足 P 谓词的未使用值。 This preserves order, but requires a larger a you can compact in a second pass.这保留了顺序,但需要更大a您可以在第二遍中进行压缩。

  2. Write an a_i array storing which index each entry belonged to.写一个a_i数组存储每个条目属于哪个索引。 This still requires a #pragma omp atomic k_local = k++ access to k , and a second sort to restore order.这仍然需要#pragma omp atomic k_local = k++访问k ,并进行第二次排序以恢复顺序。 And you'd need both a and a_i to be the full size again, or you might miss entries, so in all a terrible workaround.而且您需要aa_i再次成为完整大小,否则您可能会错过条目,因此这是一个糟糕的解决方法。

Even with some sequential dependencies you can do optimizations, eg a scan to calculate what k would be for each i could be done in O(log n) rather than O(n).即使有一些顺序依赖关系,您也可以进行优化,例如scan计算每个ik可以在 O(log n) 而不是 O(n) 中完成。 Eg parallel prefix sum , openmp discussion on stack overflow .例如并行前缀和关于堆栈溢出的 openmp 讨论 This sort of thing is what OpenMP's ordered depend is for, I believe.我相信,这种事情就是 OpenMP 的ordered depend Anyhow, this leads to the third solution:无论如何,这导致了第三种解决方案:

  1. Generate a k array, holding the values k will have for each iteration, such that those threads that will write write to the correct places.生成一个k数组,保存k每次迭代将具有的值,以便那些将写入的线程写入正确的位置。 This requires scanning the predicate.这需要扫描谓词。

It is useful to have higher level constructs like map, scan and reduce when planning out algorithms.在规划算法时,拥有更高级别的构造(如 map、扫描和减少)很有用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM