使用OpenMP和PThreads的并行程序比顺序的要慢

Question

I got a problem with the parallelization of the following program for the matrix multiplication. 对于矩阵乘法的以下程序的并行化，我遇到了问题。 The optimized versions are slower or just a very few faster than the sequential one. 优化版本比顺序版本慢或快几个。 I was allready seraching the mistake, but couldn't find it... I tested it as well on an other machine, but got the same... 我已经准备好寻找错误，但是找不到它……我也在另一台机器上进行了测试，但是得到了相同的结果……

Thanks for your help allready 感谢您的帮助

Main: 主要：

int main(int argc, char** argv){

    if((matrixA).size != (matrixB).size){
     fprintf(ResultFile,"\tError for %s and %s - Matrix A and B are not of the same size ...\n", argv[1], argv[2]);
    }
    else{
     allocateResultMatrix(&resultMatrix, matrixA.size, 0);

     if(*argv[5] == '1'){ /* Sequentielle Ausfuehrung */
      begin = clock();
      matrixMultSeq(&matrixA, &matrixB, &resultMatrix);
      end = clock();
     };

     if(*argv[5] == '2'){ /* Ausfuehrung mit OpenMP */
      printf("Max number of threads: %i \n",omp_get_max_threads());
      begin = clock();
      matrixMultOmp(&matrixA, &matrixB, &resultMatrix);
      end = clock();
     };

     if(*argv[5] == '3'){ /* Ausführung mittels PThreads */
      pthread_t  threads[NUMTHREADS];
      pthread_attr_t attr;
      int i;
      struct parameter arg[NUMTHREADS];

      pthread_attr_init(&attr); /* Attribut initialisieren */

      begin = clock();

      for(i=0; i<NUMTHREADS; i++){ /* Initialisierung der einzelnen Threads */
       arg[i].id = i;
       arg[i].num_threads = NUMTHREADS;
       arg[i].dimension = matrixA.size;
       arg[i].matrixA = &matrixA;
       arg[i].matrixB = &matrixB;
       arg[i].resultMatrix = &resultMatrix;
       pthread_create(&threads[i], &attr, worker, (void *)(&arg[i]));
      }

      pthread_attr_destroy(&attr);

      for(i=0; i<NUMTHREADS; i++){ /* Warten auf Rückkehr der Threads */
       pthread_join(threads[i], NULL);
      }

      end = clock();
    }

    t=end - begin;
    t/=CLOCKS_PER_SEC;
    if(*argv[5] == '1')
      fprintf(ResultFile, "\tTime for sequential multiplication: %0.10f seconds\n\n", t);
    if(*argv[5] == '2')
      fprintf(ResultFile, "\tTime for OpenMP multiplication: %0.10f seconds\n\n", t);
    if(*argv[5] == '3')
      fprintf(ResultFile, "\tTime for PThread multiplication: %0.10f seconds\n\n", t);
    }
  }
}

void matrixMultOmp(struct matrix * matrixA, struct matrix * matrixB, struct matrix * resultMatrix){
  int i, j, k, l;
  double sum = 0;

  l = (*matrixA).size;
#pragma omp parallel for private(j,k) firstprivate (sum)
  for(i=0; i<=l; i++){
   for(j=0; j<=l; j++){
      sum = 0;
      for(k=0; k<=l; k++){
         sum = sum + (*matrixA).matrixPointer[i][k]*(*matrixB).matrixPointer[k][j];
      }
      (*resultMatrix).matrixPointer[i][j] = sum;
    }
  }
}

void mm(int thread_id, int numthreads, int dimension, struct matrix* a, struct matrix* b, struct matrix* c){
  int i,j,k;
  double sum;
  i = thread_id;
  while (i <= dimension) {
    for (j = 0; j <= dimension; j++) {
      sum = 0;
      for (k = 0; k <= dimension; k++) {
    sum = sum + (*a).matrixPointer[i][k] * (*b).matrixPointer[k][j];
      }
      (*c).matrixPointer[i][j] = sum;
    }
    i+=numthreads;
 }
}

void * worker(void * arg){
  struct parameter * p = (struct parameter *) arg;
  mm((*p).id, (*p).numthreads, (*p).dimension, (*p).matrixA, (*p).matrixB, (*p).resultMatrix);
  pthread_exit((void *) 0);
}

Here is the Output with the times: Starting calculating resultMatrix for matrices/SimpleMatrixA.txt and matrices/SimpleMatrixB.txt ... Size of matrixA: 6 elements Size of matrixB: 6 elements Time for sequential multiplication: 0.0000030000 seconds 这是带有时间的输出：开始计算矩阵/SimpleMatrixA.txt和矩阵/SimpleMatrixB.txt的结果矩阵...矩阵A的大小：6个元素矩阵B的大小：6个元素连续乘法的时间：0.0000030000秒

Starting calculating resultMatrix for matrices/SimpleMatrixA.txt and matrices/SimpleMatrixB.txt ...
    Size of matrixA: 6 elements
    Size of matrixB: 6 elements
    Time for OpenMP multiplication: 0.0002440000 seconds

Starting calculating resultMatrix for matrices/SimpleMatrixA.txt and matrices/SimpleMatrixB.txt ...
    Size of matrixA: 6 elements
    Size of matrixB: 6 elements
    Time for PThread multiplication: 0.0006680000 seconds

Starting calculating resultMatrix for matrices/ShortMatrixA.txt and matrices/ShortMatrixB.txt ...
    Size of matrixA: 100 elements
    Size of matrixB: 100 elements
    Time for sequential multiplication: 0.0075190002 seconds

Starting calculating resultMatrix for matrices/ShortMatrixA.txt and matrices/ShortMatrixB.txt ...
    Size of matrixA: 100 elements
    Size of matrixB: 100 elements
    Time for OpenMP multiplication: 0.0076710000 seconds

Starting calculating resultMatrix for matrices/ShortMatrixA.txt and matrices/ShortMatrixB.txt ...
    Size of matrixA: 100 elements
    Size of matrixB: 100 elements
    Time for PThread multiplication: 0.0068080002 seconds

Starting calculating resultMatrix for matrices/LargeMatrixA.txt and matrices/LargeMatrixB.txt ...
    Size of matrixA: 1000 elements
    Size of matrixB: 1000 elements
    Time for sequential multiplication: 9.6421155930 seconds

Starting calculating resultMatrix for matrices/LargeMatrixA.txt and matrices/LargeMatrixB.txt ...
    Size of matrixA: 1000 elements
    Size of matrixB: 1000 elements
    Time for OpenMP multiplication: 10.5361270905 seconds

Starting calculating resultMatrix for matrices/LargeMatrixA.txt and matrices/LargeMatrixB.txt ...
    Size of matrixA: 1000 elements
    Size of matrixB: 1000 elements
    Time for PThread multiplication: 9.8952226639 seconds

Starting calculating resultMatrix for matrices/HugeMatrixA.txt and matrices/HugeMatrixB.txt ...
    Size of matrixA: 5000 elements
    Size of matrixB: 5000 elements
    Time for sequential multiplication: 1981.1383056641 seconds

Starting calculating resultMatrix for matrices/HugeMatrixA.txt and matrices/HugeMatrixB.txt ...
    Size of matrixA: 5000 elements
    Size of matrixB: 5000 elements
    Time for OpenMP multiplication: 2137.8527832031 seconds

Starting calculating resultMatrix for matrices/HugeMatrixA.txt and matrices/HugeMatrixB.txt ...
    Size of matrixA: 5000 elements
    Size of matrixB: 5000 elements
    Time for PThread multiplication: 1977.5153808594 seconds

Answer 1

As already mentioned in the comments, your first and main problem is using clock() . 正如评论中已经提到的那样，您的第一个也是主要的问题是使用clock() 。 It returns the processor time of your program's execution. 它返回程序执行的处理器时间。 What you are looking for is the wall time of your program's execution. 你所寻找的是在程序执行的挂钟时间。 In sequential code, these are the same, but with multiple cores that is not at all true. 在顺序代码中，它们是相同的，但是具有多个核心，这是完全不正确的。 Luckily, OpenMP already has you covered: use the function omp_get_wtime() instead. 幸运的是，OpenMP已经涵盖了您：使用函数omp_get_wtime()代替。

Lastly, you need larger matrices to see any benefit from multithreading. 最后，您需要更大的矩阵才能看到多线程带来的任何好处。 If the overhead of creating/managing the threads is more expensive than the actual job the threads are working on, you'll never see any benefits from parallelism. 如果创建/管理线程的开销比线程正在执行的实际工作更昂贵，那么您将永远不会从并行中看到任何好处。 It's pointless to time a 6x6 matrix multiplication because of this. 因此，计时6x6矩阵乘法毫无意义。 I would start with 1000x1000 and check 2000x2000 and 8000x8000 at the least. 我将从1000x1000开始，至少检查2000x2000和8000x8000。

使用OpenMP和PThreads的并行程序比顺序的要慢

问题描述

1 个解决方案

解决方案1
2 已采纳 2015-12-11 17:03:28

使用OpenMP和PThreads的并行程序比顺序的要慢

问题描述

1 个解决方案

解决方案1 2 已采纳 2015-12-11 17:03:28

解决方案1
2 已采纳 2015-12-11 17:03:28