简体繁体 English

如何限制Numba（CUDA）中每个线程使用的寄存器数量

[英]How to limit the number of registers used by each thread in Numba (CUDA)

原文 2017-09-30 09:15:16 8 1 python/ cuda/ numba

as the title says I would like to know if there is a way to limit the number of registers used by each thread when I launch a kernel. 正如标题所示，我想知道在启动内核时是否有一种方法可以限制每个线程使用的寄存器数量。 I'm performing a lot of computation on each thread and so the number of registers used is too high and then the occupancy is low. 我在每个线程上执行大量计算，因此所使用的寄存器数量过多，因此占用率较低。 I would like to try to reduce the number of registers used in order to try to improve parallel thread execution, maybe at the cost of more memory accesses. 我想尝试减少使用的寄存器数量，以尝试改善并行线程的执行，也许是以增加内存访问为代价的。

I searched for the answer but I didn't find a solution. 我搜索了答案，但没有找到解决方案。 I think that is possible to set a maximum number of registers used by thread with the CUDA toolchain, but is it also possible when using Numba? 我认为可以使用CUDA工具链设置线程使用的最大寄存器数，但是使用Numba时也可以吗？

EDIT: Maybe also forcing a minimum numbers of blocks to be executed in a multi processor in order to force the compiler to reduce the number of used registers. 编辑：也许还强迫在多处理器中执行最少数量的块，以强制编译器减少已使用寄存器的数量。

1 个解决方案

To the best of my knowledge, the cuda.jit facility offered by numba does not allow passing of arguments to the CUDA assembler which would allow control of register allocation, as is possible with the native CUDA toolchain. 据我所知，numba提供的cuda.jit工具不允许将参数传递给CUDA汇编器，这将允许控制寄存器分配，这与本机CUDA工具链一样。

So I don't think there is a way to do what you have asked about. 所以我认为没有办法解决您所要求的事情。