简体   繁体   English

Numba cuda:使用共享内存添加数字导致覆盖

[英]Numba cuda: Using shared memory to add numbers results in overwriting

I have been trying to add numbers using shared memory so it would be as follows:我一直在尝试使用共享内存添加数字,如下所示:

Thread 0: Add 1 to shared memory variable sharedMemT[0]线程0:将共享内存变量sharedMemT[0]加1

Thread 1: Add 1 to shared memory variable sharedMemT[0]线程一:共享内存变量sharedMemT[0]加1

synchronize threads and store sharedMemT[0] into output[0]同步线程并将 sharedMemT[0] 存储到 output[0]

But the result was... 1??但结果是……1??

@cuda.jit()
def add(output):
    sharedMemT = cuda.shared.array(shape=(1), dtype=int32)
    sharedMemT[0] = 0
    cuda.syncthreads()

    sharedMemT[0] += 1
    cuda.syncthreads()
    output[0] = sharedMemT[0]

out = np.array([0])
add[1, 2](out)
print(out) # results in [1]

Congratulations, you have a memory race.恭喜,你参加了一场记忆竞赛。 Threads 0 and 1 run at the same time, so the results are undefined, both in the operation on the shared memory variable, and in the write back to global memory.线程0和1同时运行,所以结果是未定义的,无论是对共享内存变量的操作,还是写回全局内存。

For this to work correctly, you would need to serialize access to the shared memory variable using an atomic memory operation, and then only have one thread write back to global memory:为了使其正常工作,您需要使用原子内存操作序列化对共享内存变量的访问,然后只有一个线程写回全局内存:

$ cat atomic.py

import numpy as np
from numba import cuda, int32

@cuda.jit()
def add(output):
    sharedMemT = cuda.shared.array(shape=(1), dtype=int32)
    pos = cuda.grid(1)
    if pos == 0:
        sharedMemT[0] = 0

    cuda.syncthreads()

    cuda.atomic.add(sharedMemT, 0, 1)
    cuda.syncthreads()

    if pos == 0:
        output[0] = sharedMemT[0]

out = np.array([0])
add[1, 2](out)
print(out)

$ python atomic.py
[2]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM