简体   繁体   中英

Parallelize nested for loop with respect to symmetry of all -against-all comparison with C++/OpenMP

I have the simple problem of comparing all elements to each other. The comparison itself is symmetric, therefore, it doesn't have to be done twice.

The following code example shows what I am looking for by showing the indices of the accessed elements:

int n = 5;
for (int i = 0; i < n; i++)
{
    for (int j = i + 1; j < n; j++)
    {
        printf("%d %d\n", i,j);
    }
}

The output is:

0 1
0 2
0 3
0 4
1 2
1 3
1 4
2 3
2 4
3 4

So each element is compared to each other once. When I want to parallelize this code I have the problem that first I have to stick to dynamic scheduling because the calculation time of each iteration does vary to a huge extend AND I can not use collapse due to the fact that the nested iterations are index-dependant from the outer loop.

Using #pragma omp parallel for schedule(dynamic, 3) for the outer loop may lead to single core executions at the end whereas using this for the inner loop may lead to such executions within each iteration of the outer loop.

Is there a more sophisticated way of doing/parallelizing that?

I haven't thought it thoroughly, but you can try some approach like this too:

int total = n * (n-1) / 2; // total number of combinations
#pragma omp parallel for
for (int k = 0; k < total; ++k) {
  int i = first(k, n);
  int j = second(k, n, i);
  printf("%d %d\n", i,j);
}

int first(int k, int n) {
  int i = 0;
  for (; k >= n - 1; ++i) {
    k -= n - 1;
    n -= 1;
  }
  return i;
}

int second(int k, int n, int i) {
  int t = i * (2*n - i - 1) / 2;
  return (t == 0 ? k + i + 1 : (k % t) + i + 1);
}

Indeed, the OpenMP standard says for the collapse that:

The iteration count for each associated loop is computed before entry to the outermost loop. If execution of any associated loop changes any of the values used to compute any of the iteration counts, then the behavior is unspecified.

So you cannot collapse your loops, which would have been the easiest way. However, since you're not particularly interested in the order the pairs of indexes are computed, you can change a bit your loops as follow:

for ( int i = 0; i < n; i++ ) { 
    for ( int j = 0; j < n / 2; j++ ) {
        int ii, jj;
        if ( j < i ) {
            ii = n - 1 - i;
            jj = n - 1 - j;
        }
        else {
            ii = i;
            jj = j + 1;
        }
        printf( "%d %d\n", ii, jj );
    }
}

This should give you all the pairs you want, in a somewhat mangled order, but with fixed iteration limits which allow for balanced parallelisation, and even loop collapsing if you want. Simply, if n is even, the column corresponding to n/2 will be displayed twice so either you live with it or you slightly modify the algorithm to avoid that...

I have previously had good results with the following:

#pragma omp parallel for collapse(2)
for (int i = 0; i < n; ++i) {
        for (int j = 0; j < n; ++j) {
                if (j <= i)
                        continue;
                printf("%d %d\n", i, j);
        }
}

Do remember that printf does not do any parallel workload just, so it would be best if you profiled it on your specific work. You could try adding schedule(dynamic, 10) or something greater than 10 depending on how many iterations you're performing.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM