简体   繁体   English

CUDA-Python:如何在Python中启动CUDA内核(Numba 0.25)?

[英]CUDA-Python: How to launch CUDA kernel in Python (Numba 0.25)?

could you please help me understand how to write CUDA kernels in Python? 你能帮我理解如何用Python编写CUDA内核吗? AFAIK, numba.vectorize can be performed on cuda, cpu, parallel(multi-cpus) , based on target . AFAIK, numba.vectorize可以基于目标cuda,cpu,parallel(multi-cpus)上执行。 But target='cuda' requires to set up CUDA kernels. 但是target ='cuda'需要设置CUDA内核。

The main issue is that many examples, answers in Internet are related to deprecated NumbaPro library, so it's hard to follow to such as not-updated WIKIs , especially if you're newbie. 主要问题是很多例子,互联网上的答案都与弃用的 NumbaPro库有关,因此很难跟上未更新的 WIKI ,特别是如果你是新手。

I have: 我有:

  • latest Anaconda (v2) 最新的Anaconda(v2)
  • latest Numba (v0.25) 最新的Numba(第25卷)
  • latest CUDA toolkit (v7) 最新的CUDA工具包(v7)

Here is the error I'm getting: 这是我得到的错误:

numba.cuda.cudadrv.driver.CudaAPIError: 1 Call to cuLaunchKernel results in CU DA_ERROR_INVALID_VALUE numba.cuda.cudadrv.driver.CudaAPIError: 1调用cuLaunchKernel导致CU DA_ERROR_INVALID_VALUE

import numpy as np
import time

from numba import vectorize, cuda

@vectorize(['float32(float32, float32)'], target='cuda')
def VectorAdd(a, b):
    return a + b

def main():
    N = 32000000

    A = np.ones(N, dtype=np.float32)
    B = np.ones(N, dtype=np.float32)

    start = time.time()
    C = VectorAdd(A, B)
    vector_add_time = time.time() - start

    print "C[:5] = " + str(C[:5])
    print "C[-5:] = " + str(C[-5:])

    print "VectorAdd took for % seconds" % vector_add_time

if __name__ == '__main__':
    main()

The code, as posted, is correct and will run on a Python 2 Numbapro/Accelerate system without error. 发布的代码是正确的,可以在Python 2 Numbapro / Accelerate系统上运行而不会出错。

It was likely that the particular system being used to run the code wasn't very large in capacity and was hitting a display driver watchdog or free memory error with 32 million element vectors. 可能是用于运行代码的特定系统的容量不是很大,并且使用3200万个元素向量击中了显示驱动程序看门狗或空闲内存错误。 Reducing the size of the input data allowed the code to run correctly. 减小输入数据的大小允许代码正确运行。

[This answer assembled from comments and added as a community wiki entry to get this question off the unanswered list] [这个答案汇总了评论,并作为社区维基条目添加,以便将这个问题从未答复的列表中删除]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM