[英]cuda code error within numbapro
import numpy
import numpy as np
from numbapro import cuda
@cuda.autojit
def foo(aryA, aryB,out):
d_ary1 = cuda.to_device(aryA)
d_ary2 = cuda.to_device(aryB)
#dd = numpy.empty(10, dtype=np.int32)
d_ary1.copy_to_host(out)
griddim = 1, 2
blockdim = 3, 4
aryA = numpy.arange(10, dtype=np.int32)
aryB = numpy.arange(10, dtype=np.int32)
out = numpy.empty(10, dtype=np.int32)
foo[griddim, blockdim](aryA, aryB,out)
Exception: Caused by input line 11: can only get attribute from globals, complex numbers or arrays
例外:由输入行11引起:只能从全局变量,复数或数组中获取属性
I am new to numbapro, hints are needed! 我是numbapro的新手,需要提示!
The @cuda.autotjit
marks and compiles foo()
as a CUDA kernel. @cuda.autotjit
将foo()
标记并编译为CUDA内核。 The memory transfer operations should be placed outside of the kernel. 内存传输操作应放在内核外部。 It should look like the following code:
它应类似于以下代码:
import numpy
from numbapro import cuda
@cuda.autojit
def foo(aryA, aryB ,out):
# do something here
i = cuda.threadIdx.x + cuda.blockIdx.x * cuda.blockDim.x
out[i] = aryA[i] + aryB[i]
griddim = 1, 2
blockdim = 3, 4
aryA = numpy.arange(10, dtype=numpy.int32)
aryB = numpy.arange(10, dtype=numpy.int32)
out = numpy.empty(10, dtype=numpy.int32)
# transfer memory
d_ary1 = cuda.to_device(aryA)
d_ary2 = cuda.to_device(aryB)
d_out = cuda.device_array_like(aryA) # like numpy.empty_like() but for GPU
# launch kernel
foo[griddim, blockdim](aryA, aryB, d_out)
# transfer memory device to host
d_out.copy_to_host(out)
print out
I recommend new NumbaPro users to look at the examples in https://github.com/ContinuumIO/numbapro-examples . 我建议新的NumbaPro用户查看https://github.com/ContinuumIO/numbapro-examples中的示例。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.