简体繁体 English

并发内核执行

[英]concurrent kernel execution

原文 2011-02-23 22:11:42 2 1 concurrency/ cuda

Is it possible to launch kernels from different threads of a (host) application and have them run concurrently on the same GPGPU device? 是否可以从（主机）应用程序的不同线程启动内核，并使它们在同一GPGPU设备上同时运行？ If not, do you know of any plans (of Nvidia) to provide this capability in the future? 如果没有，您是否知道Nvidia将来有任何计划提供此功能？

1 个解决方案

The programming guide http://developer.download.nvidia.com/compute/cuda/3_1/toolkit/docs/NVIDIA_CUDA_C_ProgrammingGuide_3.1.pdf says: 编程指南http://developer.download.nvidia.com/compute/cuda/3_1/toolkit/docs/NVIDIA_CUDA_C_ProgrammingGuide_3.1.pdf说：

3.2.7.3 Concurrent Kernel Execution Some devices of compute capability 2.0 can execute multiple kernels concurrently. 3.2.7.3并行内核执行一些计算能力为2.0的设备可以同时执行多个内核。 Applications may query this capability by calling cudaGetDeviceProperties() and checking the concurrentKernels property. 应用程序可以通过调用cudaGetDeviceProperties（）并检查parallelKernels属性来查询此功能。 The maximum number of kernel launches that a device can execute concurrently is sixteen. 设备可以同时执行的最大内核启动数为十六。

So the answer is: It depends. 因此答案是：这取决于。 It actually depends only on the device. 它实际上仅取决于设备。 Host threads won't make a difference in any way. 主机线程不会有任何改变。 Concurrent kernel launches are serialized if the device doesn't support concurrent kernel execution and if the device does, serial kernel launches on different streams are executed concurrently. 如果设备不支持并发内核执行，则并发内核启动将被序列化；如果设备不支持，则并发执行在不同流上的串行内核启动。