简体   繁体   English

为什么此numba.cuda查找表实现失败?

[英]Why is this numba.cuda lookup table implementation failing?

I'm trying to implement an transform which at some stage in it has a lookup table < 1K in size. 我正在尝试实现一个转换,该转换在某个阶段的尺寸小于1K。 This seems to me like it shouldn't pose a problem to a modern graphics card. 在我看来,这似乎不应该给现代图形卡带来问题。

But the code below is failing with an unknown error: 但是下面的代码失败并出现未知错误:

from numba import cuda, vectorize
import numpy as np

tmp = np.random.uniform( 0, 100, 1000000 ).astype(np.int16)
tmp_device = cuda.to_device( tmp )

lut = np.arange(100).astype(np.float32) * 2.5
lut_device = cuda.to_device(lut)

@cuda.jit(device=True)
def lookup(x):
    return lut[x]

@vectorize("float32(int16)", target="cuda")
def test_lookup(x):
    return lookup(x)

test_lookup(tmp_device).copy_to_host() # <-- fails with cuMemAlloc returning UNKNOWN_CUDA_ERROR

What am I doing against the spirit of numba.cuda? 我在违抗numba.cuda的精神在做什么?

Even replacing lookup with the following simplified code results in the same error: 即使使用以下简化代码替换lookup也会导致相同的错误:

@cuda.jit(device=True)
def lookup(x):
    return x + lut[1]

Once this error occurs, I am essentially no longer able to utilize the cuda context at all. 一旦发生此错误,我基本上将不再能够使用cuda上下文。 For instance, allocating a new array via cuda.to_device results in a: 例如,通过cuda.to_device分配新数组cuda.to_device导致:

numba.cuda.cudadrv.driver.CudaAPIError: [719] Call to cuMemAlloc results in UNKNOWN_CUDA_ERROR

Running on: 4.9.0-5-amd64 #1 SMP Debian 4.9.65-3+deb9u2 (2018-01-04) 运行于:4.9.0-5-amd64#1 SMP Debian 4.9.65-3 + deb9u2(2018-01-04)

Driver Version: 390.25 驱动程序版本:390.25

numba: 0.33.0 numba:0.33.0

The above code is fixed by modifying the part in bold: 上面的代码通过修改粗体部分来修复:

@cuda.jit(device=True)
def lookup(x):
    lut_device = cuda.const.array_like(lut)
    return lut_device[x]

I ran multiple variations of the code including simply touching the lookup table from within this kernel, but not using its output. 我运行了代码的多种变体,包括仅在此内核中触摸查找表,而不使用其输出。 This combined with @talonmies' assertion that UNKNOWN_CUDA_ERROR usually occurs with invalid instructions, I thought that perhaps there was a shared memory constraint that was causing the issue. 这与@talonmies断言UNKNOWN_CUDA_ERROR通常在无效指令时发生有关,我认为可能是共享内存约束导致了此问题。

The above code makes the whole thing work. 上面的代码使整个工作正常。 However, I still don't understand why in a profound way. 但是,我仍然不深刻理解为什么。

If anyone knows and understands why, please feel free to contribute to this answer. 如果有人知道并理解原因,请随时为这个答案做出贡献。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM