简体   繁体   English

相对于C ++ / OpenMP的所有-所有比较的对称性,并行嵌套for循环

[英]Parallelize nested for loop with respect to symmetry of all -against-all comparison with C++/OpenMP

I have the simple problem of comparing all elements to each other. 我有一个比较所有元素的简单问题。 The comparison itself is symmetric, therefore, it doesn't have to be done twice. 比较本身是对称的,因此不必进行两次比较。

The following code example shows what I am looking for by showing the indices of the accessed elements: 以下代码示例通过显示所访问元素的索引来显示我要寻找的内容:

int n = 5;
for (int i = 0; i < n; i++)
{
    for (int j = i + 1; j < n; j++)
    {
        printf("%d %d\n", i,j);
    }
}

The output is: 输出为:

0 1
0 2
0 3
0 4
1 2
1 3
1 4
2 3
2 4
3 4

So each element is compared to each other once. 因此,每个元素相互比较一次。 When I want to parallelize this code I have the problem that first I have to stick to dynamic scheduling because the calculation time of each iteration does vary to a huge extend AND I can not use collapse due to the fact that the nested iterations are index-dependant from the outer loop. 当我想并行化此代码时,我遇到一个问题,首先我必须坚持动态调度,因为每个迭代的计算时间的确变化很大,并且由于嵌套的迭代是基于索引的事实,所以我不能使用崩溃。取决于外循环。

Using #pragma omp parallel for schedule(dynamic, 3) for the outer loop may lead to single core executions at the end whereas using this for the inner loop may lead to such executions within each iteration of the outer loop. #pragma omp parallel for schedule(dynamic, 3)外部循环的#pragma omp parallel for schedule(dynamic, 3)可能会导致最后执行单核,而将其用于内部循环可能会导致外部循环的每次迭代内执行此类操作。

Is there a more sophisticated way of doing/parallelizing that? 有没有更复杂的方法可以做到/做到这一点?

I haven't thought it thoroughly, but you can try some approach like this too: 我还没有仔细考虑过,但是您也可以尝试这样的方法:

int total = n * (n-1) / 2; // total number of combinations
#pragma omp parallel for
for (int k = 0; k < total; ++k) {
  int i = first(k, n);
  int j = second(k, n, i);
  printf("%d %d\n", i,j);
}

int first(int k, int n) {
  int i = 0;
  for (; k >= n - 1; ++i) {
    k -= n - 1;
    n -= 1;
  }
  return i;
}

int second(int k, int n, int i) {
  int t = i * (2*n - i - 1) / 2;
  return (t == 0 ? k + i + 1 : (k % t) + i + 1);
}

Indeed, the OpenMP standard says for the collapse that: 实际上,OpenMP标准对崩溃的说法是:

The iteration count for each associated loop is computed before entry to the outermost loop. 每个关联循环的迭代计数是在进入最外面的循环之前计算的。 If execution of any associated loop changes any of the values used to compute any of the iteration counts, then the behavior is unspecified. 如果任何关联循环的执行更改了用于计算任何迭代计数的任何值,则该行为是不确定的。

So you cannot collapse your loops, which would have been the easiest way. 因此,您无法折叠循环,这是最简单的方法。 However, since you're not particularly interested in the order the pairs of indexes are computed, you can change a bit your loops as follow: 但是,由于您对索引对的计算顺序并不特别感兴趣,因此可以如下更改循环:

for ( int i = 0; i < n; i++ ) { 
    for ( int j = 0; j < n / 2; j++ ) {
        int ii, jj;
        if ( j < i ) {
            ii = n - 1 - i;
            jj = n - 1 - j;
        }
        else {
            ii = i;
            jj = j + 1;
        }
        printf( "%d %d\n", ii, jj );
    }
}

This should give you all the pairs you want, in a somewhat mangled order, but with fixed iteration limits which allow for balanced parallelisation, and even loop collapsing if you want. 这应该以某种混乱的顺序为您提供所有想要的对,但是具有固定的迭代限制,可以实现平衡的并行化,甚至可以折叠循环。 Simply, if n is even, the column corresponding to n/2 will be displayed twice so either you live with it or you slightly modify the algorithm to avoid that... 简而言之,如果n为偶数,则对应于n / 2的列将显示两次,因此您可以使用它,也可以稍微修改算法以避免这种情况...

I have previously had good results with the following: 我以前在以下方面取得了不错的成绩:

#pragma omp parallel for collapse(2)
for (int i = 0; i < n; ++i) {
        for (int j = 0; j < n; ++j) {
                if (j <= i)
                        continue;
                printf("%d %d\n", i, j);
        }
}

Do remember that printf does not do any parallel workload just, so it would be best if you profiled it on your specific work. 请记住, printf并不会做任何并行的工作,因此最好在特定工作中进行分析。 You could try adding schedule(dynamic, 10) or something greater than 10 depending on how many iterations you're performing. 您可以尝试添加schedule(dynamic, 10)或大于10具体取决于您执行的迭代次数。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM