简体   繁体   English

如何在 Numba 中参数化 cuda.local.array 的大小?

[英]How to parameterize the size of cuda.local.array in Numba?

I want to allocate a small local array in a Numba CUDA kernel.我想在 Numba CUDA kernel 中分配一个小的本地数组。 However, I find that it does not allow parameterized array size.但是,我发现它不允许参数化数组大小。 Only a constant size is allowed.只允许固定大小。 How can I solve this?我该如何解决这个问题?

import numba

# This works, but it has to hard code the array size
@cuda.jit
def kernel1():
    arr = numba.cuda.local.array(3, dtype=numba.float32)

kernel1[2,2]()


# I want this, but it does not work
@cuda.jit
def kernel2(dim):
    arr = numba.cuda.local.array(dim, dtype=numba.float32)

kernel2[2,2](3)

Below is the error message下面是错误信息

TypingError: Failed in cuda mode pipeline (step: nopython frontend)
No implementation of function Function(<function local.array at 0x7f074e54dee0>) found for signature:
 
 >>> array(int64, dtype=class(float32))
 
There are 2 candidate implementations:
  - Of which 2 did not match due to:
  Overload of function 'array': File: numba/cuda/cudadecl.py: Line 44.
    With argument(s): '(int64, dtype=class(float32))':
   No match.

During: resolving callee type: Function(<function local.array at 0x7f074e54dee0>)
During: typing of call at /tmp/ipykernel_18276/1701838372.py (3)


File "../../../../../tmp/ipykernel_18276/1701838372.py", line 3:
<source missing, REPL/exec in use?>

I find that it does not allow parameterized array size.我发现它不允许参数化数组大小。 Only a constant size is allowed.只允许固定大小。 How can I solve this?我该如何解决这个问题?

You can't.你不能。 As you say, only a constant size is allowed.正如你所说,只允许一个恒定的大小。 This isn't a Numba limitation, it is limitation of the CUDA programming model.这不是 Numba 限制,而是 CUDA 编程 model 的限制。 Thread local memory is always statically allocated by the compiler.线程本地 memory 始终由编译器静态分配。

There may be some meta-programming tricks you can try, analogous to C++ templates, but that will only leave you with multiple versions of the kernel with different statically compiled local array sizes, not true runtime dynamic allocation.您可能可以尝试一些元编程技巧,类似于 C++ 模板,但这只会给您留下 kernel 的多个版本,具有不同的静态编译本地数组大小,而不是真正的运行时动态分配。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM