[英]CUDA maximum registers per thread: sm_12 vs sm_20
My kernel uses registers extensively. 我的内核广泛使用寄存器。
When compiling for 1.2 devices --ptxas-options=-v
reports 83 registers. 编译1.2设备时
--ptxas-options=-v
报告83个寄存器。 When I am trying to compile for 2.0 there are only 63 registers in use, the rest of local data are put into local memory. 当我尝试编译2.0时,只有63个寄存器在使用,其余的本地数据被放入本地存储器。 Experiments with '--maxrregcount' give limit of 124 registers per thread for 1.2 devices and as few as 63 registers for 2.0.
使用'--maxrregcount'进行的实验为1.2个设备提供了每个线程124个寄存器的限制,为2.0提供了少至63个寄存器。
Is it possible to put all the data into registers on 2.0 architecture? 是否可以将所有数据放入2.0体系结构的寄存器中?
Unfortunately, the per-thread register limit for compute capability 2.x cards is 63 registers per thread. 不幸的是,计算能力2.x卡的每线程寄存器限制是每个线程63个寄存器。 There isn't anyway to stop local memory spillage if you have a very complex kernel which consumes a lot of registers.
如果你有一个非常复杂的内核消耗大量的寄存器,那么无论如何都不能阻止本地内存溢出。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.