[英]CUDA Function Won't Execute For Loop on Python with Numba
I'm trying to run a simple update loop of a simulation on the GPU. 我正在尝试在GPU上运行模拟的简单更新循环。 Basically there are a bunch of "creatures" represented by circles that in each update loop will move and then there will be a check of whether any of them intersect.
基本上,有一些由圆圈表示的“生物”,它们在每个更新循环中都会移动,然后将检查它们是否相交。
import numpy as np
import math
from numba import cuda
@cuda.jit('void(float32[:], float32[:], float32[:], uint8[:], float32[:], float32[:], float32, uint32, uint32)')
def update(p_x, p_y, radii, types, velocities, max_velocities, acceleration, num_creatures, cycles):
for c in range(cycles):
for i in range(num_creatures):
velocities[i] = velocities[i] + acceleration
if velocities[i] > max_velocities[i]:
velocities[i] = max_velocities[i]
p_x[i] = p_x[i] + (math.cos(1.0) * velocities[i])
p_y[i] = p_y[i] + (math.sin(1.0) * velocities[i])
for i in range(num_creatures):
for j in range(i, num_creatures):
delta_x = p_x[j] - p_x[i]
delta_y = p_y[j] - p_y[i]
distance_squared = (delta_x * delta_x) + (delta_y * delta_y)
sum_of_radii = radii[types[i]] + radii[types[i]]
if distance_squared < sum_of_radii * sum_of_radii:
pass
acceleration = .1
creature_radius = 10
spacing = 20
food_radius = 3
max_num_creatures = 1500
num_creatures = 0
max_num_food = 500
num_food = 0
max_num_entities = max_num_creatures + max_num_food
num_entities = 0
cycles = 1
p_x = np.empty((max_num_entities, 1), dtype=np.float32)
p_y = np.empty((max_num_entities, 1), dtype=np.float32)
radii = np.array([creature_radius, creature_radius, food_radius], dtype=np.float32)
types = np.empty((max_num_entities, 1), dtype=np.uint8)
velocities = np.empty((max_num_creatures, 1), dtype=np.float32)
max_velocities = np.empty((max_num_creatures, 1), dtype=np.float32)
# types:
# male - 0
# female - 1
# food - 2
for x in range(1, 800 // spacing):
for y in range(1, 600 // spacing):
if num_creatures % 2 == 0:
types[num_creatures] = 0
else:
types[num_creatures] = 1
p_x[num_creatures] = x * spacing
p_y[num_creatures] = y * spacing
max_velocities[num_creatures] = 5
num_creatures += 1
device_p_x = cuda.to_device(p_x)
device_p_y = cuda.to_device(p_y)
device_radii = cuda.to_device(radii)
device_types = cuda.to_device(types)
device_velocities = cuda.to_device(velocities)
device_max_velocities = cuda.to_device(max_velocities)
update(device_p_x, device_p_y, device_radii, device_types, device_velocities, device_max_velocities,
acceleration, num_creatures, cycles)
print(device_p_x.copy_to_host()[0])
The 1.0 in math.cos and math.sin is just a placeholder for the directions of the individual creatures I have a surrounding loop executed cycles amount of times. math.cos和math.sin中的1.0只是占位符,表示单个生物的方向,我执行了一个循环的循环次数。 If I try to remove it and only leave the block of code moving the creatures neither p_x, p_y or velocities have changed, even if I add a constant to them.
如果我尝试删除它,而只留下代码块来移动这些生物,即使我向它们添加常数,p_x,p_y或速度都不会改变。 Why not?
为什么不?
There are at least two problems: 至少有两个问题:
You aren't initializing velocities
: 您不是在初始化
velocities
:
velocities = np.empty((max_num_creatures, 1), dtype=np.float32)
we can fix that for a trivial test with: 我们可以使用以下方法对它进行微不足道的测试:
velocities = np.ones((max_num_creatures, 1), dtype=np.float32)
This isn't the correct array shape: 这不是正确的数组形状:
p_x = np.empty((max_num_entities, 1), dtype=np.float32) ^^^^^^^^^^^^^^^^^^^^^
to match your kernel signature: 匹配您的内核签名:
@cuda.jit('void(float32[:], float32[:], float32[:], uint8[:], float32[:], float32[:], float32, uint32, uint32)') ^^^^^^^^^^
we can fix that with: 我们可以通过以下方法解决此问题:
p_x = np.empty(max_num_entities, dtype=np.float32)
and likewise for p_y
, types
, velocities
, and max_velocities
. 同样适用于
p_y
, types
, velocities
和max_velocities
。 (I imagine some change may possibly be in order also for radii
, but it's not entirely clear what you intend with that, since it appears you want a multi-dimensional array, but are accessing it in-kernel as a single-dimensional array, AFAICT. Furthermore, that section of your kernel code is a do-nothing, so it is more or less irrelevant for the problem at hand). (我想可能还会对
radii
进行一些更改,但目前尚不清楚您打算做什么,因为您似乎想要一个多维数组,但正在内核中以一维数组的形式访问它, AFAICT:此外,您的内核代码部分是无能为力的,因此它与当前的问题或多或少无关紧要。
When I make those changes, I get what appears to be rational output: 当我进行这些更改时,我得到的似乎是合理的输出:
$ cat t9.py
import numpy as np
import math
from numba import cuda
@cuda.jit('void(float32[:], float32[:], float32[:], uint8[:], float32[:], float32[:], float32, uint32, uint32)')
def update(p_x, p_y, radii, types, velocities, max_velocities, acceleration, num_creatures, cycles):
for c in range(cycles):
for i in range(num_creatures):
velocities[i] = velocities[i] + acceleration
if velocities[i] > max_velocities[i]:
velocities[i] = max_velocities[i]
p_x[i] = p_x[i] + (math.cos(1.0) * velocities[i])
p_y[i] = p_y[i] + (math.sin(1.0) * velocities[i])
for i in range(num_creatures):
for j in range(i, num_creatures):
delta_x = p_x[j] - p_x[i]
delta_y = p_y[j] - p_y[i]
distance_squared = (delta_x * delta_x) + (delta_y * delta_y)
sum_of_radii = radii[types[i]] + radii[types[i]]
if distance_squared < sum_of_radii * sum_of_radii:
pass
acceleration = .1
creature_radius = 10
spacing = 20
food_radius = 3
max_num_creatures = 1500
num_creatures = 0
max_num_food = 500
num_food = 0
max_num_entities = max_num_creatures + max_num_food
num_entities = 0
cycles = 1
p_x = np.empty(max_num_entities, dtype=np.float32)
p_y = np.empty(max_num_entities, dtype=np.float32)
radii = np.array([creature_radius, creature_radius, food_radius], dtype=np.float32)
types = np.empty(max_num_entities, dtype=np.uint8)
velocities = np.ones(max_num_creatures, dtype=np.float32)
max_velocities = np.empty(max_num_creatures, dtype=np.float32)
# types:
# male - 0
# female - 1
# food - 2
for x in range(1, 800 // spacing):
for y in range(1, 600 // spacing):
if num_creatures % 2 == 0:
types[num_creatures] = 0
else:
types[num_creatures] = 1
p_x[num_creatures] = x * spacing
p_y[num_creatures] = y * spacing
max_velocities[num_creatures] = 5
num_creatures += 1
device_p_x = cuda.to_device(p_x)
device_p_y = cuda.to_device(p_y)
device_radii = cuda.to_device(radii)
device_types = cuda.to_device(types)
device_velocities = cuda.to_device(velocities)
device_max_velocities = cuda.to_device(max_velocities)
update(device_p_x, device_p_y, device_radii, device_types, device_velocities, device_max_velocities,
acceleration, num_creatures, cycles)
print(device_p_x.copy_to_host())
$ python t9.py
[ 2.05943317e+01 2.05943317e+01 2.05943317e+01 ..., 3.64769361e-11
1.52645868e-19 1.80563260e+28]
$
Also note that currently you are only launching one block of one thread, but I assume that is not pertinent to your request, currently. 还要注意,当前您仅启动一个线程的一个块,但是我认为当前这与您的请求无关。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.