CUDA原子和条件分支

Question

I am attempting to write a CUDA version of a serial code as a part of implementing a periodic boundary condition in a molecular dynamics algorithm. 我正在尝试编写serial代码的CUDA版本，作为在分子动力学算法中实现周期性边界条件的一部分。 The idea is that a tiny fraction of particles that have positions out of the box need to be put back in using one of two ways , with a limit on number of times I use the first way. 这个想法是，需要使用两种ways一种将一小部分具有开箱即用位置的粒子放回原处，这限制了我使用第一种方法的次数。

Essentially, it boils down to the following MWE. 本质上，它归结为以下MWE。 I have an array x[N] where N is large, and the following serial code. 我有一个数组x[N] ，其中N大，以及以下serial代码。

#include <cstdlib>

int main()
{
  int N =30000;
  double x[30000];
  int Nmax = 10, count = 0;

  for(int i = 0; i < N; i++)
    x[i] = 1.0*(rand()%3);

  for(int i = 0; i < N; i++)
   {
      if(x[i] > 2.9)
        {
          if(count < Nmax)
            {
              x[i] += 0.1; //first way
              count++;
            }
          else
            x[i] -= 0.2; //second way
        }
    }
}

Please assume that x[i] > 2.9 only for a small fraction (about 12-15) of the 30000 elements of x[i] . 请假设x[i] > 2.9仅针对x[i]的30000个元素的一小部分（约12-15）。

Note that the sequence of i is not important, ie it is not necessary to have the 10 lowest i to use x[i] += 0.1 , making the algorithm potentially parallelizable. 请注意， i的顺序并不重要，即，不必使最低的i 10才能使用x[i] += 0.1 ，这使得该算法具有潜在的可并行性。 I thought of the following CUDA version of the MWE, which compiles with nvcc -arch sm_35 main.cu , where main.cu reads as 我想到了以下MWE的CUDA版本，该版本使用nvcc -arch sm_35 main.cu进行编译，其中main.cu读为

#include <cstdlib>

__global__ void PeriodicCondition(double *x, int *N, int *Nmax, int *count)
{
  int i = threadIdx.x+blockIdx.x*blockDim.x;
  if(i < N[0])
    {
      if(x[i] > 2.9)
        {
           if(count[0] < Nmax[0]) //===============(line a)
             {
               x[i] += 0.1; //first way
               atomicAdd(&count[0],1); //========(line b)
             }
           else
             x[i] -= 0.2; //second way
        }
    }
}

int main()
{
  int N = 30000;
  double x[30000];
  int Nmax = 10, count = 0;

  srand(128512);
  for(int i = 0; i < N; i++)
    x[i] = 1.0*(rand()%3);

  double *xD;
  cudaMalloc( (void**) &xD, N*sizeof(double) );
  cudaMemcpy( xD, &x, N*sizeof(double),cudaMemcpyHostToDevice );

  int *countD;
  cudaMalloc( (void**) &countD, sizeof(int) );
  cudaMemcpy( countD, &count, sizeof(int),cudaMemcpyHostToDevice );

  int *ND;
  cudaMalloc( (void**) &ND, sizeof(int) );
  cudaMemcpy( ND, &N, sizeof(int),cudaMemcpyHostToDevice );

  int *NmaxD;
  cudaMalloc( (void**) &NmaxD, sizeof(int) );
  cudaMemcpy( NmaxD, &Nmax, sizeof(int),cudaMemcpyHostToDevice );

  PeriodicCondition<<<938,32>>>(xD, ND, NmaxD, countD);

  cudaFree(NmaxD);
  cudaFree(ND);
  cudaFree(countD);
  cudaFree(xD);

}

Of course, this is not correct because the if condition on (line a) uses a variable that is updated in (line b) , which might not be current. 当然，这是不正确的，因为(line a)上的if条件使用了在(line b)更新的变量，该变量可能不是当前变量。 This is somewhat similar to Cuda atomics change flag , however, I am not sure if and how using critical sections would help. 这有点类似于Cuda原子更改标志，但是，我不确定使用关键部分是否有帮助以及如何使用。

Is there a way to make sure count[0] is up to date when every thread checks for the if condition on (line a) , without making the code too serial? 当每个线程检查(line a)的if条件时，是否有办法确保count[0]是最新的，而又不会使代码过于串行？

Answer 1

Just increment the atomic counter every time, and use its return value in your test: 只需每次增加原子计数器，然后在测试中使用其返回值：

...
  if(x[i] > 2.9)
    {
       int oldCount = atomicAdd(&count[0],1);
       if(oldCount < Nmax[0])
         x[i] += 0.1; //first way
       else
         x[i] -= 0.2; //second way
    }
...

If as you say around 15 items exceed 2.9 and Nmax is around 10, there will be a small number of "extra" atomic operations, the overhead of which is probably minimal (and I can't see how to do it more efficiently, which isn't to say it isn't possible...). 如果您说大约15项超过2.9，而Nmax大约为10，则将有少量“额外”原子操作，其开销可能很小（而且我看不到如何更有效地进行操作）并不是说不可能...）。

CUDA原子和条件分支

问题描述

1 个解决方案

解决方案1
4 已采纳 2017-08-30 00:52:48

CUDA原子和条件分支

问题描述

1 个解决方案

解决方案1 4 已采纳 2017-08-30 00:52:48

解决方案1
4 已采纳 2017-08-30 00:52:48