具有减少和操作数组的并行循环

Question

I'm new to openMP and I try to optimize for loop. 我是openMP的新手，我尝试优化循环。 The result is not as expected, the for loops are not working correctly (due to dependency). 结果与预期不符，for循环无法正常工作（由于依赖性）。 I don't understand how to get a perfect parallel loop with the examples below : 我不明白如何通过以下示例获得完美的并行循环：

    #pragma omp parallel for default(shared) reduction(+...)
    for(i = rest - 1; i >= 0; i--) {
        scounts[i] += N;
    }

    #pragma omp parallel for private(i)
    for(i = 1; i < p; i++) {
        disp[i] = disp[i-1] + scounts[i-1];
    }

I tried these 2 pragma directives without any succes. 我尝试了这两个pragma指令，但没有成功。 What is the best way to proceed in these cases ? 在这些情况下进行处理的最佳方法是什么？

Answer 1

You have already picked a hard problem to do in parallel. 您已经选择了一个并行解决的难题。 In general when writing an array you don't want elements of the array to depend on previous elements which is exactly what you have in your second loop. 通常，在编写数组时，您不希望数组中的元素依赖于先前的元素，而这正是您在第二个循环中所拥有的。

Most people give up when they see a dependency. 大多数人在看到依赖时会放弃。 But these are the interesting cases which require a bit of thinking. 但是，这些有趣的情况需要一些思考。 In your case you second loop is equivalent to 在您的情况下，您的第二个循环等效于

type sum = 0; //replace type with int, float, double...
for(i = 1; i < p; i++) {
    sum += scounts[i-1];
    disp[i] = disp[0] + sum;
}

This is a cumulative sum (aka prefix sum ). 这是一个累加和（又名前缀和）。 OpenMP does not provide easy constructs to do the prefix sum. OpenMP没有提供简单的构造来进行前缀和。 You have to do it in two passes. 您必须分两次通过。 Here is how you do it (I assumed the type of disp and scounts is int but you can replace it with float or whatever): 这是您的操作方法（我假设disp和scounts的类型是int但是您可以用float或其他任何方式替换它）：

int *suma;
#pragma omp parallel
{
    int ithread = omp_get_thread_num();
    int nthreads = omp_get_num_threads();
    #pragma omp single
    {
        suma = malloc(nthreads * sizeof *suma);
        suma[0] = 0;
    }
    int sum = 0;
    #pragma omp for schedule(static)
    for (int i=1; i<p; i++) {
        sum += scounts[i-1];
        disp[i] = disp[0] + sum;
    }
    suma[omp_get_thread_num()+1] = sum;
    #pragma omp barrier
    int offset = 0;
    for(int i=0; i<(ithread+1); i++) {
        offset += suma[i];
    }
    #pragma omp for schedule(static)
    for(int i=1; i<p; i++) {
        disp[i] += offset;
    }
}
free(suma);

But if you're just learning OpenMP I suggest you start with an easier case first. 但是，如果您只是学习OpenMP，建议您先从一个简单的案例开始。

Answer 2

Please use #pragma directly: 请直接使用#pragma ：

#pragma omp parallel ...

instead of #pragma in comment: 而不是#pragma注释：

// #pragma omp parallel ...

具有减少和操作数组的并行循环

问题描述

2 个解决方案

解决方案1
1 已采纳 2015-06-12 08:50:54

解决方案2
0 2015-06-11 15:40:40

具有减少和操作数组的并行循环

问题描述

2 个解决方案

解决方案1 1 已采纳 2015-06-12 08:50:54

解决方案2 0 2015-06-11 15:40:40

解决方案1
1 已采纳 2015-06-12 08:50:54

解决方案2
0 2015-06-11 15:40:40