简体   繁体   English

如何使用 numba/CUDA 加速 python 对象函数?

[英]How to speed up python object functions using numba/CUDA?

I'm new to CUDA(installed numba about an hour ago).我是 CUDA 的新手(大约一个小时前安装了 numba)。 I'd like to speed up this function that's inside a class.我想加速这个类中的函数。

def predict(self, X):
   num_test = X.shape[0]
   Ypred = np.zeros(num_test, dtype=self.ytr.dtype)

   for i in range(num_test):
      distances = np.sum(np.abs(self.Xtr-X[i, :]), axis=1)
      min_index = np.argmin(distances)
      Ypred[i] = self.ytr[min_index]
      print(i)

   return Ypred

X is a 2D array of type float32 and Ypred is an array of type int32. X 是 float32 类型的二维数组,Ypred 是 int32 类型的数组。 I tried to speed it by inserting the following line just above the function.我试图通过在函数上方插入以下行来加速它。

  @vectorize(['int32(float32)'], target='cuda')

This gave me a huge list of errors but the significant part of it appears to be:这给了我一大堆错误,但其中重要的部分似乎是:

TypeError: Failed at nopython (analyzing bytecode)
Signature mismatch: 1 argument types given, but function takes 2 arguments

While I know exactly what the error says, I've no idea how to fix it.虽然我确切地知道错误说的是什么,但我不知道如何解决它。 So... how do I make it work?那么......我如何使它工作? Thanks in advance.提前致谢。

UPDATE:更新:

I should've done a proper google search before asking(I did search, but I used the word 'object' instead of 'class', which gave me no useful results).我应该在询问之前进行适当的谷歌搜索(我确实进行了搜索,但我使用了“对象”而不是“类”这个词,这没有给我任何有用的结果)。 The documentation helped me a lot, but now I've got these errors in my face and I've no clue what to do. 文档对我帮助很大,但现在我遇到了这些错误,我不知道该怎么做。

numba.errors.LoweringError: Failed at nopython (nopython mode backend)
Can only insert float* at [4] in {i8*, i8*, i64, i64, float*, [2 x i64], 
[2 x i64]}: got double*
File "main.py", line 40
[1] During: lowering "(self).Xtr = X" at D:/myStuff/DL/Week 3/1/main.py (40)
[2] During: resolving callee type: 
BoundFunction((<class 'numba.types.misc.ClassInstanceType'>, 'train')       f   for instance.jitclass.NearestNeighbours#24f01184f58<Xtr:array(float32, 2d, A),ytr:array(int32, 1d, A)>)
[3] During: typing of call at <string> (3)
--%<-----------------------------------------------------------------

File "<string>", line 3

Here's the entire class in its current state:这是当前状态下的整个类:

spec = [("Xtr", float32[:, :]), ("ytr", int32[:])]

@jitclass(spec)
class NearestNeighbours(object):
  def __init__(self):
    pass

  def train(self, X, y):
    self.Xtr = X            #line 40
    self.ytr = y

  def predict(self, X):
    num_test = X.shape[0]
    Ypred = np.zeros(num_test, dtype=self.ytr.dtype)

    for i in range(num_test):
      distances = np.sum(np.abs(self.Xtr-X[i, :]), axis=1)
      min_index = np.argmin(distances)
      Ypred[i] = self.ytr[min_index]
      print(i)

    return Ypred

UPDATE 2: Gave up on jitting the class and tried linking predict to its external clone.更新 2:放弃 jitting 类并尝试将 predict 链接到其外部克隆。 Using an empty jit seemed to work, but linking to cuda(for the speed) led to all sorts of fancy errors.使用空 jit 似乎有效,但链接到 cuda(为了速度)会导致各种奇怪的错误。 I'll give it a rest for today and answer my own question if I somehow manage to resolve.如果我以某种方式设法解决,我今天会休息一下并回答我自己的问题。 Until a few hours ago, I thought that GPU acceleration would be as simple as adding an additional library or switching to a different compiler or something... but man... I had no idea I'd be in for such a bumpy ride.直到几个小时前,我还认为 GPU 加速就像添加一个额外的库或切换到不同的编译器或其他东西一样简单......但是伙计......我不知道我会经历如此坎坷的旅程.

As far as I can see your function only relies on X so there is no reason to have it as function in a class.据我所知,您的函数仅依赖于X因此没有理由将其作为类中的函数。 Either declare it static @staticmethod (link) or take it out of the class - scope.要么将其声明为静态@staticmethod (链接),要么将其从类 - 范围中取出。

Presto: only 1 function parameter left. Presto:仅剩 1 个函数参数。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM