[英]Could I use Numba shared memory to accelerate with Cupy?
TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Invalid use of Function(<built-in function getitem>) with argument(s) of type(s):
(array(float64, 2d, C), Tuple(int64, int32, int64))
* parameterized
File "<ipython-input-34-637851842bfe>", line 34:
def macroscopic(fin,u,v):
<source elided>
for k in range(TPB):
tmp1 = v[i,0] * sfin[i,tx,k]
Here is part of my code, v is a [9,2] cupy array.这是我的代码的一部分,v 是一个 [9,2] Cupy 数组。
import cupy as cp
from numba import cuda
v = cp.array([ [ 1, 1], [ 1, 0], [ 1, -1], [ 0, 1], [ 0, 0],
[ 0, -1], [-1, 1], [-1, 0], [-1, -1] ])
Previous is the definition of the constant array v前面是常量数组v的定义
@cuda.jit
def macroscopic(fin,u,v):
#Shared Memory
# The computation will be done on blocks of TPBxTPB elements.
TPB = 16
sfin = cuda.shared.array(shape=(TPB, TPB), dtype=numba.float64)
x,y = cuda.grid(2)
tx = cuda.threadIdx.x
ty = cuda.threadIdx.y
bpg = cuda.gridDim.x # blocks per grid
if x >= fin.shape[1] and y >= fin.shape[2]:
return
tmp1 = 0.
tmp2 = 0.
for i in range(9):
for j in range(bpg):
#preload data to shared memory
sfin[tx, ty] = fin[i,x, ty + j * TPB]
cuda.syncthreads()
#compute in the shared memory
for k in range(TPB):
tmp1 = v[i,0] * sfin[i,tx,k]
tmp2 = v[i,1] * sfin[i,tx,k]
cuda.syncthreads()
u[0,x,y] += tmp1
u[1,x,y] += tmp2
where, fin is a [9,420,180] cupy array.其中,fin 是一个 [9,420,180] 杯形数组。 I want to compute the u, [2,420,180] array.
我想计算 u, [2,420,180] 数组。 The normal computation (non-GPU) is:
正常计算(非 GPU)是:
for i in range(9):
u[0,:,:] += v[i,0] * fin[i,:,:]
u[1,:,:] += v[i,1] * fin[i,:,:]
However, it throws an error as previously described.但是,如前所述,它会引发错误。 I don't know why.
我不知道为什么。 Is the data type of v and sfin not match?
v和sfin的数据类型不匹配吗? Shall I change to Numpy array to use the shared memory in Numba?
我应该更改为 Numpy 数组以使用 Numba 中的共享 memory 吗?
I run into the same problem, and I used Numpy arrays with cuda.to_device() function to transfer them to the GPU. I run into the same problem, and I used Numpy arrays with cuda.to_device() function to transfer them to the GPU. I think at the moment Cupy is not compatible with shared memory arrays.
我认为目前 Cupy 与共享 memory arrays 不兼容。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.