CUDA如何在运行时在共享内存的内核中创建数组？

Question

I have the task of large number of threads running, each doing a small matrix multiplication. 我的任务是运行大量线程，每个线程执行一个小的矩阵乘法。 All the small matrices have been loaded to the global memory. 所有小矩阵均已加载到全局存储器中。 I wish to improve performance by letting each thread load its small matrices into shared memory, and then compute the product. 我希望通过让每个线程将其较小的矩阵加载到共享内存中，然后计算乘积来提高性能。 But the problem is that I do not know the sizes of the matrices during compile time. 但是问题是我在编译时不知道矩阵的大小。 So I cannot create variables as in __shared__ double mat1[XSIZE][YSIZE] . 因此，我无法像__shared__ double mat1[XSIZE][YSIZE]那样创建变量。 On PC, I would have made a dynamic allocation. 在PC上，我会进行动态分配。 But I do not know if I could do it on the shared memory. 但是我不知道是否可以在共享内存上执行此操作。 If calling malloc in a kernel would allocate only in global memory (assuming such a call is possible), that does not help either. 如果在内核中调用malloc仅在全局内存中进行分配（假设可以进行此类调用），那么这也无济于事。

Is there a way to declare arrays during runtime in kernel? 有没有一种方法可以在内核运行时声明数组？ Is there any other way to resolve this problem? 还有其他解决方法吗？

Answer 1

You can declare dynamically sized shared memory allocations in CUDA, like this 您可以在CUDA中声明动态大小的共享内存分配，如下所示

__global__ void kernel()
{
    extern __shared__ double *mat1;
}

And then launch your kernel like this 然后像这样启动你的内核

kernel<<<grid,block,XSIZE*YSIZE*sizeof(double)>>>();

This is discussed in more detail in the CUDA programming guide. CUDA编程指南中对此进行了更详细的讨论。

CUDA如何在运行时在共享内存的内核中创建数组？

问题描述

1 个解决方案

解决方案1
5 已采纳 2011-12-24 23:56:44

CUDA如何在运行时在共享内存的内核中创建数组？

问题描述

1 个解决方案

解决方案1 5 已采纳 2011-12-24 23:56:44

解决方案1
5 已采纳 2011-12-24 23:56:44