Cuda并行化内核共享计数器变量

Question

Is there a way to have an integer counter variable that can be incremented/decremented across all threads in a parallelized cuda kernel? 有没有一种方法可以使整数计数器变量可以在并行化的cuda内核中的所有线程之间递增/递减？ The below code outputs "[1]" since the modifications to the counter array from one thread is not applied in the others. 下面的代码输出“ [1]”，因为一个线程对计数器数组的修改未应用到其他线程。

import numpy as np
from numba import cuda


@cuda.jit('void(int32[:])')
def func(counter):
    counter[0] = counter[0] + 1


counter = cuda.to_device(np.zeros(1, dtype=np.int32))
threadsperblock = 64
blockspergrid = 18
func[blockspergrid, threadsperblock](counter)
print(counter.copy_to_host())

Answer 1

One approach would be to use numba cuda atomics : 一种方法是使用numba cuda原子：

$ cat t18.py
import numpy as np
from numba import cuda


@cuda.jit('void(int32[:])')
def func(counter):
    cuda.atomic.add(counter, 0, 1)


counter = cuda.to_device(np.zeros(1, dtype=np.int32))
threadsperblock = 64
blockspergrid = 18
print blockspergrid * threadsperblock
func[blockspergrid, threadsperblock](counter)
print(counter.copy_to_host())
$ python t18.py
1152
[1152]
$

An atomic operation performs an indivisible read-modify-write operation on the target, so threads do not interfere with each other when they update the target variable. 原子操作对目标执行不可分割的读取-修改-写入操作，因此线程在更新目标变量时不会相互干扰。

Certainly other methods are possible, depending on your actual needs, such as a classical parallel reduction . 当然，根据您的实际需求，也可以使用其他方法，例如经典的并行约简。 numba provides some reduction sugar also. numba还提供一些还原糖。

Cuda并行化内核共享计数器变量

问题描述

1 个解决方案

解决方案1
3 已采纳 2018-09-06 06:14:48

Cuda并行化内核共享计数器变量

问题描述

1 个解决方案

解决方案1 3 已采纳 2018-09-06 06:14:48

解决方案1
3 已采纳 2018-09-06 06:14:48