Numbapro cuda python在gpu中的线程寄存器中定义数组

Question

I know how to create a global device function inside Host using np.array or np.zeros or np.empty(shape, dtype) and then using cuda.to_device to copy. 我知道如何在主机内部使用np.array或np.zeros或np.empty(shape, dtype)创建全局设备功能，然后使用cuda.to_device复制。

Also, one can declare shared array as cuda.shared.array(shape, dtype) 另外，可以将共享数组声明为cuda.shared.array(shape, dtype)

But how to create an array of constant size in the register of a particular thread inside gpu function. 但是如何在gpu函数中的特定线程的寄存器中创建大小恒定的数组。

I tried cuda.device_array or np.array but nothing worked. 我尝试了cuda.device_array或np.array但没有任何效果。

I simply want to do this inside a thread - 我只是想在线程内执行此操作-

x = array(CONSTANT, int32) # should make x for each thread

Answer 1

Numbapro supports numba.cuda.local.array(shape, type) for defining thread local arrays. Numbapro支持numba.cuda.local.array(shape, type)用于定义线程局部数组。

As with CUDA C, whether than array is defined in local memory or register is a compiler decision based on usage patterns of the array. 与CUDA C一样，是否在本地内存或寄存器中定义数组是基于数组使用模式的编译器决定。 If the indexing pattern of the local array is statically defined and there is sufficient register space, the compiler will use registers to store the array. 如果本地数组的索引模式是静态定义的，并且有足够的寄存器空间，则编译器将使用寄存器来存储数组。 Otherwise it will be stored in local memory. 否则它将被存储在本地存储器中。 See this question and answer pair for more information. 有关更多信息，请参见此问题与答案对。

Numbapro cuda python在gpu中的线程寄存器中定义数组

问题描述

1 个解决方案

解决方案1
0 已采纳

Numbapro cuda python在gpu中的线程寄存器中定义数组

问题描述

1 个解决方案

解决方案1 0 已采纳

解决方案1
0 已采纳