简体   繁体   English

Numba CUDA 运行时共享内存大小?

[英]Numba CUDA shared memory size at runtime?

In CUDA C it's straightforward to define a shared memory of size specified at runtime.在 CUDA C 中,定义在运行时指定大小的共享内存很简单。 How can I do this with Numba/NumbaPro CUDA?如何使用 Numba/NumbaPro CUDA 执行此操作?

What I've done so far has only resulted in errors with the message "Argument 'shape' must be a constant".到目前为止,我所做的只会导致错误消息“参数'形状'必须是常量”。

EDIT: Just to clarify, what I want is an equivalent to the following in C CUDA (example taken and adapted from here :编辑:只是为了澄清,我想要的是相当于 C CUDA 中的以下内容(从这里获取并改编的示例:

__global__ void dynamicReverse(int *d, int n)
{
  extern __shared__ int s[];

  // some work in the kernel with the shared memory
}

int main(void)
{
  const int n = 64;
  int a[n];

  // run dynamic shared memory version
  dynamicReverse<<<1,n,n*sizeof(int)>>>(a, n);

}

I found the solution (through the very helpful Continuum Analytics user support).我找到了解决方案(通过非常有用的 Continuum Analytics 用户支持)。 What we do is define the shared memory as we'd normally do but set the shape to 0. Then, to define the size of the shared array we have to give it as the fourth parameter (after the stream identifier) to the kernel.我们所做的是像往常一样定义共享内存,但将形状设置为 0。然后,要定义共享数组的大小,我们必须将它作为第四个参数(在流标识符之后)提供给内核。 Eg:例如:

@cuda.autojit
def myKernel(a):
   sm = cuda.shared.array(shape=0,dtype=numba.int32)

   # do stuff

arga = np.arange(512)
grid = 1
block = 512
stream = 0
sm_size = arga.size * arga.dtype.itemsize
myKernel[grid,block,stream,sm_size](arga)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM