OpenMP并行代码变慢

Question

I have two loops which I am parallelizing 我有两个要并行化的循环

#pragma omp parallel for
  for (i = 0; i < ni; i++)
    for (j = 0; j < nj; j++) {
      C[i][j] = 0;
      for (k = 0; k < nk; ++k)
        C[i][j] += A[i][k] * B[k][j];
    }
#pragma omp parallel for
  for (i = 0; i < ni; i++)
    for (j = 0; j < nl; j++) {
      E[i][j] = 0;
      for (k = 0; k < nj; ++k)
        E[i][j] += C[i][k] * D[k][j];
    }

Strangely the sequential execution is much faster than parallel version above even using large number of threads. 奇怪的是，即使使用大量线程，顺序执行也比并行版本要快得多。 Am I doing something wrong? 难道我做错了什么？ Note that all arrays are global. 请注意，所有数组都是全局的。 Does this make that difference? 这有什么区别吗？

Answer 1

The iterations of your parallel outer loops share the index variables ( j and k ) of their inner loops. 并行外部循环的迭代共享其内部循环的索引变量（ j和k ）。 This for sure makes your code somewhat slower than you probably expected it to be, ie, your loops are not "embarrassingly" (or "delightfully") parallel and parallel loop iterations need to somehow accesses these variables from shared memory. 这肯定会使您的代码比您预期的要慢一些，即，您的循环不会“尴尬” （或“令人愉快”）并行和并行循环迭代需要以某种方式从共享内存访问这些变量。

What is worse, is that, because of this, your code contains race conditions . 更糟糕的是，因此，您的代码包含竞争条件。 As a result, it will behave nondeterministically. 结果，它将无法确定地运行。 In other words: your implementation of parallel matrix multiplication is now incorrect! 换句话说：您的并行矩阵乘法实现现在不正确！ (Go ahead and check the results of your computations. ;)) （继续检查计算结果。））

What you want to do is make sure that all iterations of your outer loops have their own private copies of the index variables j and k . 您要做的是确保外部循环的所有迭代都具有索引变量j和k的专用副本。 You can achieve this either by declaring these variables within the scope of the parallel loops: 您可以通过在并行循环范围内声明以下变量来实现此目的：

int i;

#pragma omp parallel for
  for (i = 0; i < ni; i++) {
    int j1, k1;  /* explicit local copies */
    for (j1 = 0; j1 < nj; j1++) {
      C[i][j1] = 0;
      for (k1 = 0; k1 < nk; ++k1)
        C[i][j1] += A[i][k1] * B[k1][j1];
    }
  }        
#pragma omp parallel for
  for (i = 0; i < ni; i++) {
    int j2, k2;  /* explicit local copies */
    for (j2 = 0; j2 < nl; j2++) {
      E[i][j2] = 0;
      for (k2 = 0; k2 < nj; ++k2)
        E[i][j2] += C[i][k2] * D[k2][j2];
    }
  }

or otherwise declaring them as private in your loop pragmas: 或在循环编译中将它们声明为private ：

int i, j, k;

#pragma omp parallel for private(j, k)
  for (i = 0; i < ni; i++)
    for (j = 0; j < nj; j++) {
      C[i][j] = 0;
      for (k = 0; k < nk; ++k)
        C[i][j] += A[i][k] * B[k][j];
    }
#pragma omp parallel for private(j, k)
  for (i = 0; i < ni; i++)
    for (j = 0; j < nl; j++) {
      E[i][j] = 0;
      for (k = 0; k < nj; ++k)
        E[i][j] += C[i][k] * D[k][j];
    }

Will these changes make your parallel implementation faster than your sequential implementation? 这些更改会使并行实现比顺序实现更快吗？ Hard to say. 很难说。 It depends on your problem size. 这取决于您的问题大小。 Parallelisation (in particular parallelisation through OpenMP) comes with some overhead. 并行化（特别是通过OpenMP进行并行化）会带来一些开销。 Only if you spawn enough parallel work, the gain of distributing work over parallel threads will outweigh the incurred overhead costs. 仅当产生足够的并行工作时，在并行线程上分配工作的收益才会超过产生的间接费用。

To find out how much work is enough for your code and your software/hardware platform, I advise to experiment by running your code with different matrix sizes. 为了找出足以满足您的代码和软件/硬件平台需求的工作量，建议您以不同的矩阵大小运行代码以进行实验。 Then, if you also expect "too" small matrix sizes as inputs for your computation you may want to make parallel processing conditional (for example, by decorating your loop pragmas with an if -clauses): 然后，如果您还期望矩阵尺寸太小而无法作为计算的输入，则可能需要使并行处理成为条件处理（例如，使用if -clauses修饰循环编译）：

#pragma omp parallel for private (j, k) if(ni * nj * nk > THRESHOLD)
  for (i = 0; i < ni; i++) {
     ...
  }
#pragma omp parallel for private (j, k) if(ni * nl * nj > THRESHOLD)
  for (i = 0; i < ni; i++) {
    ...
  }

OpenMP并行代码变慢

问题描述

1 个解决方案

解决方案1
4 已采纳 2014-12-01 20:16:33

OpenMP并行代码变慢

问题描述

1 个解决方案

解决方案1 4 已采纳 2014-12-01 20:16:33

解决方案1
4 已采纳 2014-12-01 20:16:33