简体   繁体   English

CUDA原子操作和并发内核启动

[英]CUDA atomic operations and concurrent kernel launch

Currently I develop a GPU-based program that use multiple kernels that are launched concurrently by using multiple streams. 目前,我开发了一个基于GPU的程序,该程序使用多个内核,这些内核通过使用多个流同时启动。

In my application, multiple kernels need to access a queue/stack and I have plan to use atomic operations. 在我的应用程序中,多个内核需要访问队列/堆栈,我计划使用原子操作。

But I do not know whether atomic operations work between multiple kernels concurrently launched. 但我不知道原子操作是否在多个内核同时启动之间起作用。 Please help me anyone who know the exact mechanism of the atomic operations on GPU or who has experience with this issue. 请帮助我了解GPU上的原子操作的确切机制或有此问题经验的人。

Atomics are implemented in the L2 cache hardware of the GPU, through which all memory operations must pass. 原子在GPU的L2缓存硬件中实现,所有内存操作都必须通过该硬件。 There is no hardware to ensure coherency between host and device memory, or between different GPUs; 没有硬件可以确保主机和设备内存之间或不同GPU之间的一致性; but as long as the kernels are running on the same GPU and using device memory on that GPU to synchronize, atomics will work as expected. 但只要内核在同一GPU上运行并使用该GPU上的设备内存进行同步,原子将按预期工作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM