简体   繁体   English

如何使用“尽可能多的动态共享内存”启动内核?

[英]How can I launch a kernel with “as much dynamic shared mem as is possible”?

We know CUDA devices have very limited shared memory capacities, in the tens of Kilobytes only. 我们知道CUDA设备的共享内存容量非常有限,仅为几十千字节。 And we also know kernels won't launch (typically? ever?) If you ask for too much shared memory. 而且我们还知道,如果您请求过多的共享内存,内核将无法启动(通常? And we also know that the available shared memory is used both by the static allocations in code that you use and the dynamically-allocated shared memory. 而且我们知道,可用的共享内存被您使用的代码中的静态分配和动态分配的共享内存所使用。

Now, cudaGetDeviceProperties() gives us the overall space we have. 现在, cudaGetDeviceProperties()为我们提供了整体空间。 But, given a function symbol, is it possible to determine how much statically-allocated shared memory it would use, so that I can "fill up" the shared mem to full capacity on launch? 但是,给定一个功能符号,是否可以确定它将使用多少静态分配的共享内存,以便我可以在启动时将共享内存“填满”到满容量? If not, is there a possibility of having CUDA take care of this for me somehow? 如果没有,CUDA是否有可能以某种方式为我解决这个问题?

The runtime API has a function cudaFuncGetAttributes which will allow you to retrieve the attributes of any kernel in the current context, including the amount of static shared memory per block which the kernel will consume. 运行时API具有cudaFuncGetAttributes函数,该函数可让您检索当前上下文中任何内核的属性 ,包括内核将消耗的每个块的静态共享内存量。 You can do the math yourself with that information. 您可以自己使用该信息进行数学运算。

您还可以使用nvcc编译信息来获取共享内存的静态分配

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM