简体   繁体   English

OpenMP和C并行for循环:为什么我的代码在使用OpenMP时会变慢?

[英]OpenMP and C parallel for loop: why does my code slow down when using OpenMP?

I'm new here and a beginner level programmer in C. I'm having some problem with using openmp to speedup the for-loop. 我是新手,也是C语言的初级程序员。我在使用openmp加速for循环方面遇到了一些问题。 Below is simple example: 以下是简单的例子:

#include <stdlib.h>
#include <stdio.h>
#include <gsl/gsl_rng.h>
#include <omp.h>

gsl_rng *rng;

main()
{
int i, M=100000000;
double tmp;

/* initialize RNG */
gsl_rng_env_setup();
rng = gsl_rng_alloc (gsl_rng_taus);
gsl_rng_set (rng,(unsigned long int)791526599);

// option 1: parallel        
  #pragma omp parallel for default(shared) private( i, tmp ) schedule(dynamic)
  for(i=0;i<=M-1;i++){
     tmp=gsl_ran_gamma_mt(rng, 4, 1./3 );
  }


// option 2: sequential       
  for(i=0;i<=M-1;i++){
     tmp=gsl_ran_gamma_mt(rng, 4, 1./3 );
  }
}

The code draws from a gamma random distribution for M iterations. 代码从M个迭代的伽马随机分布中提取。 It turns out the parallel approach with openmp (option 1) takes about 1 minute while the sequential approach (option 2) takes only 20 seconds. 事实证明,使用openmp(选项1)的并行方法大约需要1分钟,而顺序方法(选项2)只需要20秒。 While running with openmp, I can see the cpu usage is 800% ( the server I'm using has 8 CPUs ). 使用openmp运行时,我可以看到CPU使用率为800%(我使用的服务器有8个CPU)。 And the system is linux with GCC 4.1.3. 系统是使用GCC 4.1.3的linux。 The compile command I'm using is gcc -fopenmp -lgsl -lgslcblas -lm (I'm using GSL ) 我正在使用的编译命令是gcc -fopenmp -lgsl -lgslcblas -lm(我正在使用GSL)

Am I doing something wrong? 难道我做错了什么? Please help me! 请帮我! Thanks! 谢谢!

PS As pointed out by some users, it might be caused by rng. PS正如一些用户所指出的,它可能是由rng引起的。 But even if I replace 但即使我更换

tmp=gsl_ran_gamma_mt(rng, 4, 1./3 );

by say 通过说

tmp=1000*10000;

the problem still there... 问题仍然存在......

gsl_ran_gamma_mt probably locks on rng to prevent concurrency issues (if it didn't, your parallel code probably contains a race condition and thus yields wrong results). gsl_ran_gamma_mt可能会锁定rng以防止并发问题(如果没有,您的并行代码可能包含竞争条件,从而产生错误的结果)。 The solution then would be to have a separate rng instance for each thread, thus avoiding locking. 然后解决方案是为每个线程创建一个单独的rng实例,从而避免锁定。

Your rng variable is shared, so the threads are spending all their time waiting to be able to use the random number generator. 您的rng变量是共享的,因此线程花费所有时间等待能够使用随机数生成器。 Give each thread a separate instance of the RNG. 为每个线程提供一个单独的RNG实例。 This will probably mean making the RNG initialization code run in parallel as well. 这可能意味着使RNG初始化代码也并行运行。

Again thanks everyone for helping. 再次感谢大家的帮助。 I just found out that if I get rid of 我刚刚发现如果我摆脱了

schedule(dynamic)

in the code, the problem disapears. 在代码中,问题消失了。 But why is that? 但那是为什么呢?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM