Numba CUDA shared memory size at runtime?

Question

In CUDA C it's straightforward to define a shared memory of size specified at runtime. How can I do this with Numba/NumbaPro CUDA?

What I've done so far has only resulted in errors with the message "Argument 'shape' must be a constant".

EDIT: Just to clarify, what I want is an equivalent to the following in C CUDA (example taken and adapted from here :

__global__ void dynamicReverse(int *d, int n)
{
  extern __shared__ int s[];

  // some work in the kernel with the shared memory
}

int main(void)
{
  const int n = 64;
  int a[n];

  // run dynamic shared memory version
  dynamicReverse<<<1,n,n*sizeof(int)>>>(a, n);

}

Answer 1

I found the solution (through the very helpful Continuum Analytics user support). What we do is define the shared memory as we'd normally do but set the shape to 0. Then, to define the size of the shared array we have to give it as the fourth parameter (after the stream identifier) to the kernel. Eg:

@cuda.autojit
def myKernel(a):
   sm = cuda.shared.array(shape=0,dtype=numba.int32)

   # do stuff

arga = np.arange(512)
grid = 1
block = 512
stream = 0
sm_size = arga.size * arga.dtype.itemsize
myKernel[grid,block,stream,sm_size](arga)

Numba CUDA shared memory size at runtime?

Question

1 answers

solution1
9 ACCPTED 2015-05-29 08:24:47

Numba CUDA shared memory size at runtime?

Question

1 answers

solution1 9 ACCPTED 2015-05-29 08:24:47

solution1
9 ACCPTED 2015-05-29 08:24:47