简体   繁体   English

每个线程的CUDA最大寄存器:sm_12 vs sm_20

[英]CUDA maximum registers per thread: sm_12 vs sm_20

My kernel uses registers extensively. 我的内核广泛使用寄存器。

When compiling for 1.2 devices --ptxas-options=-v reports 83 registers. 编译1.2设备时--ptxas-options=-v报告83个寄存器。 When I am trying to compile for 2.0 there are only 63 registers in use, the rest of local data are put into local memory. 当我尝试编译2.0时,只有63个寄存器在使用,其余的本地数据被放入本地存储器。 Experiments with '--maxrregcount' give limit of 124 registers per thread for 1.2 devices and as few as 63 registers for 2.0. 使用'--maxrregcount'进行的实验为1.2个设备提供了每个线程124个寄存器的限制,为2.0提供了少至63个寄存器。

Is it possible to put all the data into registers on 2.0 architecture? 是否可以将所有数据放入2.0体系结构的寄存器中?

Unfortunately, the per-thread register limit for compute capability 2.x cards is 63 registers per thread. 不幸的是,计算能力2.x卡的每线程寄存器限制是每个线程63个寄存器。 There isn't anyway to stop local memory spillage if you have a very complex kernel which consumes a lot of registers. 如果你有一个非常复杂的内核消耗大量的寄存器,那么无论如何都不能阻止本地内存溢出。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM