为什么此numba.cuda查找表实现失败？

Question

I'm trying to implement an transform which at some stage in it has a lookup table < 1K in size. 我正在尝试实现一个转换，该转换在某个阶段的尺寸小于1K。 This seems to me like it shouldn't pose a problem to a modern graphics card. 在我看来，这似乎不应该给现代图形卡带来问题。

But the code below is failing with an unknown error: 但是下面的代码失败并出现未知错误：

from numba import cuda, vectorize
import numpy as np

tmp = np.random.uniform( 0, 100, 1000000 ).astype(np.int16)
tmp_device = cuda.to_device( tmp )

lut = np.arange(100).astype(np.float32) * 2.5
lut_device = cuda.to_device(lut)

@cuda.jit(device=True)
def lookup(x):
    return lut[x]

@vectorize("float32(int16)", target="cuda")
def test_lookup(x):
    return lookup(x)

test_lookup(tmp_device).copy_to_host() # <-- fails with cuMemAlloc returning UNKNOWN_CUDA_ERROR

What am I doing against the spirit of numba.cuda? 我在违抗numba.cuda的精神在做什么？

Even replacing lookup with the following simplified code results in the same error: 即使使用以下简化代码替换lookup也会导致相同的错误：

@cuda.jit(device=True)
def lookup(x):
    return x + lut[1]

Once this error occurs, I am essentially no longer able to utilize the cuda context at all. 一旦发生此错误，我基本上将不再能够使用cuda上下文。 For instance, allocating a new array via cuda.to_device results in a: 例如，通过cuda.to_device分配新数组cuda.to_device导致：

numba.cuda.cudadrv.driver.CudaAPIError: [719] Call to cuMemAlloc results in UNKNOWN_CUDA_ERROR

Running on: 4.9.0-5-amd64 #1 SMP Debian 4.9.65-3+deb9u2 (2018-01-04) 运行于：4.9.0-5-amd64＃1 SMP Debian 4.9.65-3 + deb9u2（2018-01-04）

Driver Version: 390.25 驱动程序版本：390.25

numba: 0.33.0 numba：0.33.0

Answer 1

The above code is fixed by modifying the part in bold: 上面的代码通过修改粗体部分来修复：

@cuda.jit(device=True)
def lookup(x):
    lut_device = cuda.const.array_like(lut)
    return lut_device[x]

I ran multiple variations of the code including simply touching the lookup table from within this kernel, but not using its output. 我运行了代码的多种变体，包括仅在此内核中触摸查找表，而不使用其输出。 This combined with @talonmies' assertion that UNKNOWN_CUDA_ERROR usually occurs with invalid instructions, I thought that perhaps there was a shared memory constraint that was causing the issue. 这与@talonmies断言UNKNOWN_CUDA_ERROR通常在无效指令时发生有关，我认为可能是共享内存约束导致了此问题。

The above code makes the whole thing work. 上面的代码使整个工作正常。 However, I still don't understand why in a profound way. 但是，我仍然不深刻理解为什么。

If anyone knows and understands why, please feel free to contribute to this answer. 如果有人知道并理解原因，请随时为这个答案做出贡献。

为什么此numba.cuda查找表实现失败？

问题描述

1 个解决方案

解决方案1
0 已采纳 2018-03-04 13:57:47

为什么此numba.cuda查找表实现失败？

问题描述

1 个解决方案

解决方案1 0 已采纳 2018-03-04 13:57:47

解决方案1
0 已采纳 2018-03-04 13:57:47