简体   繁体   English

为什么使用私有动态数组时性能最差

[英]Why am I getting worst performance with a private dynamic array

I want to use OpenMP to parallelize a for-loop calculator which does something like:我想使用 OpenMP 来并行化一个 for-loop 计算器,它执行以下操作:

B = (int*)malloc(sizeof(int) * N); //N is known
for(i=0;i<500000;i++)
{  
    for(j=0;j<M;j++) B[j]=i+j;  //M is different from N, but M <= N;
    some operations on B which produce a variable L;
    printf("%d\n",L);    
}

I don't need to re-allocate B as its values will be defined for each iteration accordingly.我不需要重新分配 B,因为它将为每次迭代相应地定义它的值。 The operations will only use B[0] to B[M-1].这些操作将只使用 B[0] 到 B[M-1]。 This saves a lot of time in allocating and initialization of B.这在 B 的分配和初始化方面节省了大量时间。

In order to use openmp, I changed the code to this:为了使用 openmp,我将代码更改为:

#pragma omp parallel num_threads(32) private(i,j,B,M,L)
{
  B = (int*)malloc(sizeof(int) * N); //N is known
  #pragma omp parallel for 
  for(i=0;i<500000;i++)
  {  
      for(j=0;j<M;j++) B[j]=i+j;  //M is different from N, but M <= N;
      some operations on B which produce a variable L;
      printf("%d\n",L);    
  }
}

It runs really slow compared to the first one, as it creates a new B array for each thread (so 500000 times).与第一个相比,它的运行速度非常慢,因为它为每个线程创建了一个新的 B 数组(所以 500000 次)。 Is there a way to avoid this using openmp?有没有办法使用openmp来避免这种情况?

The main issue is that the iterations of the loop are not being assigned to threads as you wanted.主要问题是循环的迭代没有按照您的意愿分配给线程。 Because you have added again the clause parallel to #pragma omp for , and assuming that you have nested parallelism disabled, which by default it is, each of the threads created in the outer parallel region will execute "sequentially" the code within that region, namely:因为您再次添加了与#pragma omp for parallel的子句,并假设您已禁用嵌套并行性,默认情况下,在外部parallel区域中创建的每个线程都将“按顺序”执行该区域内的代码,即:

  #pragma omp parallel for 
  for(i=0;i<500000;i++){  
      ...
  }

Therefore, each thread will execute all the 500000 iterations of the inner loop that you intended to be parallelized.因此,每个线程将执行您打算并行化的内部循环的所有500000次迭代。 Consequently, removing the parallelism and adding additional overhead ( eg, thread creation) to the sequential code.因此,消除了并行性并为顺序代码增加了额外的开销(例如,线程创建)。 Nonetheless, one can easily solve this issue by merely removing the second parallel clause, namely:尽管如此,只需删除第二个parallel子句即可轻松解决此问题,即:

#pragma omp parallel num_threads(32) private(i,j,B,M,L)
{
    B = (int*)malloc(sizeof(int) * N); //N is known
    #pragma omp for 
    for(i=0;i<500000;i++){  
      ...   
    }
}

Depending upon the setup where the code will be executed ( eg, in a NUMA architecture or not, if the malloc function used is (or not) thread-aware memory allocator, among others) it might be advisable to profile your parallel region to check if it pays off (or not) to move the allocation of the 2D array to the outside of that region.根据将执行代码的设置(例如,NUMA架构中与否,如果malloc function 使用的是(或不是)线程感知 ZCD69B4957F06CD818DvisZBF3D61980E21)它可能建议与所有配置文件并行检查如果将2D数组的分配移动到该区域的外部是否有回报(或没有回报)。 An example, of what the alternative version might look like:替代版本的示例:

int total_threads = 32;
int** B = malloc(sizeof(*int) * total_threads);
for(int i = 0; i < total_threads; i++){
    B[i] = malloc(N * sizeof(int));
}

#pragma omp parallel num_threads(32) private(i,j,M,L)
{
  int threadID = omp_get_thread_num();
  #pragma omp for 
  for(i=0;i<500000;i++)
  {  
      for(j=0;j<M;j++) 
          B[threadID][j]=i+j;  //M is different from N, but M <= N;
      some operations on B which produce a variable L;
      printf("%d\n",L);    
  }
}
// you might need to reduce all the values from all threads
// to main thread array.

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 当我尝试使用这个动态数组时,为什么会出现EXC BAD ACCESS错误? - Why am I getting a EXC BAD ACCESS error when I try to impliment this dynamic array? 为什么在使用类成员访问另一个类的私有成员时会出现“是私有的”错误? - Why am I getting "is private" errors when using class members to access another class's private members? 为什么在Array Queue程序中得到不期望的输出? - Why am I getting an undesired output in my Array Queue program? 为什么我得到的斐波那契数达到我的数组的大小? - Why am I getting fibonacci numbers up to the size of my array? 为什么我无法访问类&#39;boost :: asio :: detail :: noncopyable&#39;中声明的私有成员? - Why am I getting cannot access private member declared in class 'boost::asio::detail::noncopyable'? 为什么将私有变量添加到类[C ++]时出现错误 - Why am i getting an error when adding private variable to class [C++] 为什么我会收到此多维数组的分段错误? - Why am I getting a segmentation fault for this multidimensional array? 为什么我会收到本地阵列的分段错误? - Why am I getting a Segmentation fault for local array? 为什么我对数组中的所有值都得到零输出? - Why I am getting zero output for all the values in array? 为什么我要输出一个数组的垃圾? - c++ why am I getting junk outputting an array?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM