繁体   English   中英

运行python numbapro时Cuda资源不足错误

[英]Cuda out of resources error when running python numbapro

我正在尝试在numbapro python中运行cuda内核,但我不断遇到资源不足错误。 然后,我尝试将内核执行到循环中并发送较小的数组,但这仍然给我同样的错误。

这是我的错误信息:

Traceback (most recent call last):
File "./predict.py", line 418, in <module>
predict[griddim, blockdim, stream](callResult_d, catCount, numWords, counts_d, indptr_d, indices_d, probtcArray_d, priorC_d)
File "/home/mhagen/Developer/anaconda/lib/python2.7/site-packages/numba/cuda/compiler.py", line 228, in __call__
sharedmem=self.sharedmem)
File "/home/mhagen/Developer/anaconda/lib/python2.7/site-packages/numba/cuda/compiler.py", line 268, in _kernel_call
cu_func(*args)
File "/home/mhagen/Developer/anaconda/lib/python2.7/site-packages/numba/cuda/cudadrv/driver.py", line 1044, in __call__
self.sharedmem, streamhandle, args)
File "/home/mhagen/Developer/anaconda/lib/python2.7/site-packages/numba/cuda/cudadrv/driver.py", line 1088, in launch_kernel
None)
File "/home/mhagen/Developer/anaconda/lib/python2.7/site-packages/numba/cuda/cudadrv/driver.py", line 215, in safe_cuda_api_call
self._check_error(fname, retcode)
File "/home/mhagen/Developer/anaconda/lib/python2.7/site-packages/numba/cuda/cudadrv/driver.py", line 245, in _check_error
raise CudaAPIError(retcode, msg)
numba.cuda.cudadrv.driver.CudaAPIError: Call to cuLaunchKernel results in CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES

这是我的源代码:

from numbapro.cudalib import cusparse
from numba import *
from numbapro import cuda

@cuda.jit(argtypes=(double[:], int64, int64, double[:], int64[:], int64[:], double[:,:], double[:] ))
def predict( callResult, catCount, wordCount, counts, indptr, indices, probtcArray, priorC ):

    i = cuda.threadIdx.x + cuda.blockIdx.x * cuda.blockDim.x

    correct = 0
    wrong = 0
    lastDocIndex = -1
    maxProb = -1e6
    picked = -1

    for cat in range(catCount):

        probSum = 0.0

        for j in range(indptr[i],indptr[i+1]):
            wordIndex = indices[j]
            probSum += (counts[j]*math.log(probtcArray[cat,wordIndex]))

        probSum += math.log(priorC[cat])
        if probSum > maxProb:
            maxProb = probSum
            picked = cat

    callResult[i] = picked

predictions = []
counter = 1000
for i in range(int(math.ceil(numDocs/(counter*1.0)))):
    docTestSliceList = docTestList[i*counter:(i+1)*counter]
    numDocsSlice = len(docTestSliceList)
    docTestArray = np.zeros((numDocsSlice,numWords))
    for j,doc in enumerate(docTestSliceList):
        for ind in doc:
            docTestArray[j,ind['term']] = ind['count']
    docTestArraySparse = cusparse.ss.csr_matrix(docTestArray)

    start = time.time()
    OPT_N = numDocsSlice
    blockdim = 1024, 1
    griddim = int(math.ceil(float(OPT_N)/blockdim[0])), 1 

    catCount = len(music_categories)
    callResult = np.zeros(numDocsSlice)
    stream = cuda.stream()
    with stream.auto_synchronize():
        probtcArray_d = cuda.to_device(numpy.asarray(probtcArray),stream)
        priorC_d = cuda.to_device(numpy.asarray(priorC),stream)
        callResult_d = cuda.to_device(callResult, stream)
        counts_d = cuda.to_device(docTestArraySparse.data, stream)
        indptr_d = cuda.to_device(docTestArraySparse.indptr, stream)
        indices_d = cuda.to_device(docTestArraySparse.indices, stream)
        predict[griddim, blockdim, stream](callResult_d, catCount, numWords, counts_d, indptr_d, indices_d, probtcArray_d, priorC_d)

        callResult_d.to_host(stream)
    #stream.synchronize()
    predictions += list(callResult)

    print "prediction %d: %f" % (i,time.time()-start)

我发现这是在CUDA程序中。

当您调用预测时,blockdim设置为1024。预测[griddim,blockdim,流](callResult_d,catCount,numWords,counts_d,indptr_d,indexs_d,probtcArray_d,previousC_d)

但是该过程被迭代调用,其切片大小为1000个元素,而不是1024个。因此,在该过程中,它将尝试写入返回数组中超出范围的24个元素。

发送多个元素参数(n_el)并将错误检查调用放在cuda过程中即可解决该问题。

@cuda.jit(argtypes=(double[:], int64, int64, int64, double[:], int64[:], int64[:], double[:,:], double[:] ))
def predict( callResult, n_el, catCount, wordCount, counts, indptr, indices, probtcArray, priorC ):

     i = cuda.threadIdx.x + cuda.blockIdx.x * cuda.blockDim.x

     if i < n_el:

         ....

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM