简体   繁体   English

CUDA如何在运行时在共享内存的内核中创建数组?

[英]CUDA how to create arrays in runtime in kernel in shared memory?

I have the task of large number of threads running, each doing a small matrix multiplication. 我的任务是运行大量线程,每个线程执行一个小的矩阵乘法。 All the small matrices have been loaded to the global memory. 所有小矩阵均已加载到全局存储器中。 I wish to improve performance by letting each thread load its small matrices into shared memory, and then compute the product. 我希望通过让每个线程将其较小的矩阵加载到共享内存中,然后计算乘积来提高性能。 But the problem is that I do not know the sizes of the matrices during compile time. 但是问题是我在编译时不知道矩阵的大小。 So I cannot create variables as in __shared__ double mat1[XSIZE][YSIZE] . 因此,我无法像__shared__ double mat1[XSIZE][YSIZE]那样创建变量。 On PC, I would have made a dynamic allocation. 在PC上,我会进行动态分配。 But I do not know if I could do it on the shared memory. 但是我不知道是否可以在共享内存上执行此操作。 If calling malloc in a kernel would allocate only in global memory (assuming such a call is possible), that does not help either. 如果在内核中调用malloc仅在全局内存中进行分配(假设可以进行此类调用),那么这也无济于事。

Is there a way to declare arrays during runtime in kernel? 有没有一种方法可以在内核运行时声明数组? Is there any other way to resolve this problem? 还有其他解决方法吗?

You can declare dynamically sized shared memory allocations in CUDA, like this 您可以在CUDA中声明动态大小的共享内存分配,如下所示

__global__ void kernel()
{
    extern __shared__ double *mat1;
}

And then launch your kernel like this 然后像这样启动你的内核

kernel<<<grid,block,XSIZE*YSIZE*sizeof(double)>>>();

This is discussed in more detail in the CUDA programming guide. CUDA编程指南中对此进行了更详细的讨论。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM