[英]CUDA: Is It Possible to Use All of 48KB of On-Die Memory As Shared Memory?
I am developing a CUDA application for GTX 580 with CUDA Toolkit 4.0 and Visual Studio 2010 Professional on Windows 7 64bit SP1. 我正在使用Windows 7 64位SP1上的CUDA Toolkit 4.0和Visual Studio 2010 Professional为GTX 580开发CUDA应用程序。 My program is more memory-intensive than typical CUDA programs, and I am trying to allocate as much shared memory as possible to each CUDA block.
我的程序比典型的CUDA程序更耗费内存,我试图为每个CUDA块分配尽可能多的共享内存。 However, the program crashes every time I try to use more than 32K of shared memory for each block.
但是,每次尝试为每个块使用超过32K的共享内存时,程序都会崩溃。
From reading official CUDA documentations, I learned that there is 48KB of on-die memory for each SM on a CUDA device with Compute Capability of 2.0 or greater, and the on-die memory is split between L1 cache and shared memory: 通过阅读官方CUDA文档,我了解到CUDA设备上每个SM有48KB的片上存储器,其计算能力为2.0或更高,而片上存储器在L1缓存和共享存储器之间分配:
The same on-chip memory is used for both L1 and shared memory, and how much of it is dedicated to L1 versus shared memory is configurable for each kernel call (Section F.4.1) http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/Fermi_Tuning_Guide.pdf
相同的片上存储器用于L1和共享存储器,并且可以为每个内核调用配置多少L1和共享存储器(第F.4.1节) http://developer.download.nvidia.com /compute/DevZone/docs/html/C/doc/Fermi_Tuning_Guide.pdf
This led me to suspect that only 32KB of one-die memory was allocated as shared memory when my program was running. 这让我怀疑在我的程序运行时只有32KB的单内存被分配为共享内存。 Hence my question: Is it possible to use all of 48KB of on-die memory as shared memory?
因此我的问题是:是否可以将所有48KB的片上内存用作共享内存?
I tried everything I could think of. 我尝试了我能想到的一切。 I specified the option --ptxas-options="-v -dlcm=cg" for nvcc, and I called cudaDeviceSetCacheConfig() and cudaFuncSetCacheConfig() in my program, but none of them resolved the issue.
我为nvcc指定了选项--ptxas-options =“ - v -dlcm = cg”,我在程序中调用了cudaDeviceSetCacheConfig()和cudaFuncSetCacheConfig(),但没有一个解决了这个问题。 I even made sure that there was no register spilling and that I did not accidentally use local memory:
我甚至确保没有寄存器溢出,并且我没有意外地使用本地内存:
1> 24 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
1> ptxas info : Used 63 registers, 40000+0 bytes smem, 52 bytes cmem[0], 2540 bytes cmem[2], 8 bytes cmem[14], 72 bytes cmem[16]
Although I can live with 32KB of shared memory, which already gave me a huge performance boost, I would rather take full advantage of all of the fast on-die memory. 虽然我可以使用32KB的共享内存,这已经给了我巨大的性能提升,但我宁愿充分利用所有快速的片上内存。 Any help is much appreciated.
任何帮助深表感谢。
Update: I was launching 640 threads when the program crashed. 更新:我在程序崩溃时启动了640个线程。 512 gave me a better performance than 256 did, so I tried to increase the number of threads further.
512给了我比256更好的性能,所以我试图进一步增加线程数。
Your problem is not related to the shared memory configuration but with the number of threads you are launching. 您的问题与共享内存配置无关,但与您要启动的线程数有关。
Using 63 register per threads and launching 640 threads give you a total of 40320 registers. 每个线程使用63个寄存器并启动640个线程,总共可以提供40320个寄存器。 The total amount of register of your device is 32K, so you are running out of resources.
您的设备的注册总量为32K,因此资源不足。
Regarding to the on-chip memory is well explained in the Tom's answer, and as he commented, check the API calls for errors will help you for future errors. 关于片上存储器在Tom的答案中得到了很好的解释,并且正如他所评论的那样,检查API调用中的错误将有助于您将来的错误。
Devices of compute capability 2.0 and higher have 64KB of on-chip memory per SM. 计算能力2.0及更高版本的设备每个SM具有64KB的片上存储器。 This is configurable as 16KB L1 and 48KB smem or 48KB L1 and 16KB smem (also 32/32 on compute capability 3.x).
这可配置为16KB L1和48KB涂层或48KB L1和16KB涂层(计算能力3.x上也是32/32)。
Your program is crashing for another reason. 您的程序因其他原因而崩溃。 Are you checking all API calls for errors?
您是否检查所有API调用是否有错误? Have you tried cuda-memcheck?
你试过cuda-memcheck吗?
If you use too much shared memory then you will get an error when you launch the kernel saying that the were insufficient resources. 如果您使用太多共享内存,那么当您启动内核时说明资源不足时会出现错误。
此外,将参数从主机传递到GPU使用共享内存(最多256个字节),因此您永远不会获得实际的48KB。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.