OpenMP For 循环因增加线程而变慢

Question

我有一个简单的 for 循环遍历数组。 当我使用更多处理器时，它会变慢。 这是代码：

#include <omp.h>
#include <sys/time.h>
#include <iostream>
#include <vector>
#include <fstream>

using namespace std;

int main(int argc, char* argv[])
{
    string nth;
    if(argc<2)
    {
         cout << "Not enough parameters have been passed. \n";
         cin.get();
         exit(0);
    }
    else
    {
       nth=argv[1];
    }

    N=1000;
    vector<vector< int> > I;
    int *array= new int[N];
    // Initialize I and array

    struct timeval time_start;
    gettimeofday(&time_start, NULL);
    for (int y=0; y<I.size(); y++) {
        int i= I[y][0];
        int j= I[y][1];
        if (array[i]!=array[j]) {
            int a=array[i];
            int b=array[j];
            int min=min(a,b);

        #pragma omp parallel for shared (a,b,min)
            for (int n=0; n<N; n++)
            {
                if (array[n]==a || array[n]==b) {
                    array[n]=min;
                }
            }
        }
}

    struct timeval time_end;
    gettimeofday(&time_end, NULL);
    double sectiontime = (time_end.tv_sec * 1000000 + time_end.tv_usec) - (time_start.tv_sec * 1000000 + time_start.tv_usec);
    cout<<"Section Time: "<<sectiontime<<endl;
    delete array;
    I.clear();
    return 0;
}

我将其编译为：

g++ test.cpp -fopenmp -o outTestPar -std=c++0x

并通过以下方式运行它：

./outTestPar 2

我在一台 64 核的机器上运行它。 我得到这个结果：

带 2 个处理器：

[...]$ ./outTestPar 2
Section Time: 28003
[...]$ ./outTestPar 2
Section Time: 20897
[...]$ ./outTestPar 2
Section Time: 19506
[...]$ ./outTestPar 2
Section Time: 22990

带 4 个处理器：

[...]$ ./outTestPar 4
Section Time: 20362
[...]$ ./outTestPar 4
Section Time: 19963
[...]$ ./outTestPar 4
Section Time: 28147
[...]$ ./outTestPar 4
Section Time: 20857

8个处理器：

[...]$ ./outTestPar 8
Section Time: 24881
[...]$ ./outTestPar 8
Section Time: 28056

使用 16 个处理器：

[...]$ ./outTestPar 16
Section Time: 24332
[...]$ ./outTestPar 16
Section Time: 26921

使用 32 个处理器：

[...]$ ./outTestPar 32
Section Time: 21858
[...]$ ./outTestPar 32
Section Time: 23367
[...]$ ./outTestPar 32
Section Time: 25200
[...]$ ./outTestPar 32
Section Time: 24813

如您所见，不仅没有改善，而且有时会变得更糟。 知道发生了什么吗？ 我该如何改进？ 我还尝试了不同的时间表（静态、动态、引导）。 没有用，并使情况变得更糟。

Answer 1

您正在并行化的循环非常小并且受内存限制。 不需要太多内核就可以使内存饱和，之后性能就会下降。

如果不同的线程写入相同的缓存行，您也可能会遇到缓存问题。 具体细节因硬件而异，但通常如果一个线程写入一个值，整个缓存行将对其他线程无效，导致它们在仍在使用时重新加载该行。

减少内存写入影响的一种方法是仅在值更改时写入。 因此，与其检查“a”或“b”，然后写入两个值中的最小值，不如让 if 语句检查两个值中的较大值，并将其替换为较小的值。 这将通过不写出不变的值来减少写入内存的次数。

OpenMP For 循环因增加线程而变慢

问题描述

1 个解决方案

解决方案1
1 2020-03-10 03:41:36

OpenMP For 循环因增加线程而变慢

问题描述

1 个解决方案

解决方案1 1 2020-03-10 03:41:36

解决方案1
1 2020-03-10 03:41:36