我可以使用 Numba 共享的 memory 与 Cupy 一起加速吗？

Question

TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Invalid use of Function(<built-in function getitem>) with argument(s) of type(s): 
(array(float64, 2d, C), Tuple(int64, int32, int64))

 * parameterized

File "<ipython-input-34-637851842bfe>", line 34:

def macroscopic(fin,u,v):

<source elided>

  for k in range(TPB):

    tmp1 = v[i,0] * sfin[i,tx,k]

Here is part of my code, v is a [9,2] cupy array.这是我的代码的一部分，v 是一个 [9,2] Cupy 数组。

import cupy as cp
from numba import cuda
v = cp.array([ [ 1,  1], [ 1,  0], [ 1, -1], [ 0,  1], [ 0,  0],

           [ 0, -1], [-1,  1], [-1,  0], [-1, -1] ])

Previous is the definition of the constant array v前面是常量数组v的定义

@cuda.jit
def macroscopic(fin,u,v):
  #Shared Memory
  # The computation will be done on blocks of TPBxTPB elements.
  TPB = 16
  sfin = cuda.shared.array(shape=(TPB, TPB), dtype=numba.float64)
  x,y = cuda.grid(2)
  tx = cuda.threadIdx.x
  ty = cuda.threadIdx.y
  bpg = cuda.gridDim.x    # blocks per grid
  if x >= fin.shape[1] and y >= fin.shape[2]:
    return

  tmp1 = 0.
  tmp2 = 0.

  for i in range(9):
    for j in range(bpg):
      #preload data to shared memory
      sfin[tx, ty] = fin[i,x, ty + j * TPB]
      cuda.syncthreads()
      #compute in the shared memory
      for k in range(TPB):
        tmp1 = v[i,0] * sfin[i,tx,k]
        tmp2 = v[i,1] * sfin[i,tx,k]

    cuda.syncthreads()
    u[0,x,y] += tmp1
    u[1,x,y] += tmp2

where, fin is a [9,420,180] cupy array.其中，fin 是一个 [9,420,180] 杯形数组。 I want to compute the u, [2,420,180] array.我想计算 u, [2,420,180] 数组。 The normal computation (non-GPU) is:正常计算（非 GPU）是：

for i in range(9):
    u[0,:,:] += v[i,0] * fin[i,:,:]
    u[1,:,:] += v[i,1] * fin[i,:,:]

However, it throws an error as previously described.但是，如前所述，它会引发错误。 I don't know why.我不知道为什么。 Is the data type of v and sfin not match? v和sfin的数据类型不匹配吗？ Shall I change to Numpy array to use the shared memory in Numba?我应该更改为 Numpy 数组以使用 Numba 中的共享 memory 吗？

Answer 1

I run into the same problem, and I used Numpy arrays with cuda.to_device() function to transfer them to the GPU. I run into the same problem, and I used Numpy arrays with cuda.to_device() function to transfer them to the GPU. I think at the moment Cupy is not compatible with shared memory arrays.我认为目前 Cupy 与共享 memory arrays 不兼容。

我可以使用 Numba 共享的 memory 与 Cupy 一起加速吗？

问题描述

1 个解决方案

解决方案1
0 2020-05-27 05:23:54

我可以使用 Numba 共享的 memory 与 Cupy 一起加速吗？

问题描述

1 个解决方案

解决方案1 0 2020-05-27 05:23:54

解决方案1
0 2020-05-27 05:23:54