OpenMP和C并行for循环：为什么我的代码在使用OpenMP时会变慢？

Question

I'm new here and a beginner level programmer in C. I'm having some problem with using openmp to speedup the for-loop. 我是新手，也是C语言的初级程序员。我在使用openmp加速for循环方面遇到了一些问题。 Below is simple example: 以下是简单的例子：

#include <stdlib.h>
#include <stdio.h>
#include <gsl/gsl_rng.h>
#include <omp.h>

gsl_rng *rng;

main()
{
int i, M=100000000;
double tmp;

/* initialize RNG */
gsl_rng_env_setup();
rng = gsl_rng_alloc (gsl_rng_taus);
gsl_rng_set (rng,(unsigned long int)791526599);

// option 1: parallel        
  #pragma omp parallel for default(shared) private( i, tmp ) schedule(dynamic)
  for(i=0;i<=M-1;i++){
     tmp=gsl_ran_gamma_mt(rng, 4, 1./3 );
  }


// option 2: sequential       
  for(i=0;i<=M-1;i++){
     tmp=gsl_ran_gamma_mt(rng, 4, 1./3 );
  }
}

The code draws from a gamma random distribution for M iterations. 代码从M个迭代的伽马随机分布中提取。 It turns out the parallel approach with openmp (option 1) takes about 1 minute while the sequential approach (option 2) takes only 20 seconds. 事实证明，使用openmp（选项1）的并行方法大约需要1分钟，而顺序方法（选项2）只需要20秒。 While running with openmp, I can see the cpu usage is 800% ( the server I'm using has 8 CPUs ). 使用openmp运行时，我可以看到CPU使用率为800％（我使用的服务器有8个CPU）。 And the system is linux with GCC 4.1.3. 系统是使用GCC 4.1.3的linux。 The compile command I'm using is gcc -fopenmp -lgsl -lgslcblas -lm (I'm using GSL ) 我正在使用的编译命令是gcc -fopenmp -lgsl -lgslcblas -lm（我正在使用GSL）

Am I doing something wrong? 难道我做错了什么？ Please help me! 请帮我！ Thanks! 谢谢！

PS As pointed out by some users, it might be caused by rng. PS正如一些用户所指出的，它可能是由rng引起的。 But even if I replace 但即使我更换

tmp=gsl_ran_gamma_mt(rng, 4, 1./3 );

by say 通过说

tmp=1000*10000;

the problem still there... 问题仍然存在......

Answer 1

gsl_ran_gamma_mt probably locks on rng to prevent concurrency issues (if it didn't, your parallel code probably contains a race condition and thus yields wrong results). gsl_ran_gamma_mt可能会锁定rng以防止并发问题（如果没有，您的并行代码可能包含竞争条件，从而产生错误的结果）。 The solution then would be to have a separate rng instance for each thread, thus avoiding locking. 然后解决方案是为每个线程创建一个单独的rng实例，从而避免锁定。

Answer 2

Your rng variable is shared, so the threads are spending all their time waiting to be able to use the random number generator. 您的rng变量是共享的，因此线程花费所有时间等待能够使用随机数生成器。 Give each thread a separate instance of the RNG. 为每个线程提供一个单独的RNG实例。 This will probably mean making the RNG initialization code run in parallel as well. 这可能意味着使RNG初始化代码也并行运行。

Answer 3

Again thanks everyone for helping. 再次感谢大家的帮助。 I just found out that if I get rid of 我刚刚发现如果我摆脱了

schedule(dynamic)

in the code, the problem disapears. 在代码中，问题消失了。 But why is that? 但那是为什么呢？

OpenMP和C并行for循环：为什么我的代码在使用OpenMP时会变慢？

问题描述

3 个解决方案

解决方案1
12 已采纳 2012-08-23 15:58:54

解决方案2
5 2012-08-23 15:58:57

解决方案3
1 2012-08-23 18:23:25

OpenMP和C并行for循环：为什么我的代码在使用OpenMP时会变慢？

问题描述

3 个解决方案

解决方案1 12 已采纳 2012-08-23 15:58:54

解决方案2 5 2012-08-23 15:58:57

解决方案3 1 2012-08-23 18:23:25

解决方案1
12 已采纳 2012-08-23 15:58:54

解决方案2
5 2012-08-23 15:58:57

解决方案3
1 2012-08-23 18:23:25