简体繁体 English

Nvidia Hyper Q和Nvidia Streams有什么区别？

[英]What is the difference between Nvidia Hyper Q and Nvidia Streams?

原文 2019-05-22 05:18:46 9 1 cuda/ nvidia/ gpgpu/ cuda-streams

I always thought that Hyper-Q technology is nothing but the streams in GPU. 我一直认为Hyper-Q技术不过是GPU中的流。 Later I found I was wrong(Am I?). 后来我发现我错了（是吗？）。 So I was doing some reading about Hyper-Q and got confused more. 因此，我在阅读有关Hyper-Q的文章时感到困惑。 I was going through one article and it had these two statements: 我正在阅读一篇文章，它包含以下两个陈述：

A. Hyper-Q is a flexible solution that allows separate connections from multiple CUDA streams, from multiple Message Passing Interface (MPI) processes, or even from multiple threads within a process 答：Hyper-Q是一种灵活的解决方案，它允许来自多个CUDA流，来自多个消息传递接口（MPI）进程甚至一个进程中的多个线程的单独连接

B. Hyper-Q increases the total number of connections (work queues) between the host and the GK110 GPU by allowing 32 simultaneous, hardware-managed connections (compared to the single connection available with Fermi) B. Hyper-Q通过允许32个同时进行的硬件管理的连接（与Fermi可用的单个连接相比），增加了主机与GK110 GPU之间的连接（工作队列）总数。

In aforementioned points, Point B says that there can be multiple connected created to a single GPU from host. 在前面提到的观点中， 观点B指出主机可以创建多个连接到单个GPU。 Does it mean I can create multiple context on a simple GPU through different applications? 这是否意味着我可以通过不同的应用程序在一个简单的GPU上创建多个上下文？ Does it mean that I will have to execute all applications on different streams?What if all my connections are memory and compute resource consuming, who manages the resource (memory/cores) scheduling? 这是否意味着我将不得不在不同的流上执行所有应用程序？如果我的所有连接都占用内存并且计算资源消耗，谁来管理资源（内存/核心）调度，该怎么办？

1 个解决方案

Think of HyperQ as streams implemented in hardware on the device side. 将HyperQ视为在设备端的硬件中实现的流。

Before the arrival of HyperQ, eg on Fermi, commands (kernel launches, memory transfers, etc.) from all streams were placed in a single work queue by the driver on the host. 在HyperQ到达之前，例如在Fermi上，主机上的驱动程序将所有流中的命令（内核启动，内存传输等）放置在单个工作队列中。 That meant that commands could not overtake each other, and you had to be careful issuing them in the right order on the host to achieve best overlap. 这意味着命令不能互相超越，并且您必须小心在主机上以正确的顺序发出命令以实现最佳重叠。

On the GK110 GPU and later devices with HyperQ, there are (at least) 32 work queues on the device. 在带有HyperQ的GK110 GPU和更高版本的设备上，该设备上（至少）有32个工作队列。 This means that commands from different queues can be reordered relative to each other until they start execution. 这意味着来自不同队列的命令可以相对于彼此重新排序，直到它们开始执行。 So both orderings in the example linked above lead to good overlap on a GK110 device. 因此，上面链接的示例中的两种排序都可以在GK110设备上实现良好的重叠。

This is particularly important for multithreaded host code, where you can't control the order without additional synchronization between threads. 这对于多线程宿主代码尤其重要，在多宿主代码中，如果没有线程之间的额外同步，就无法控制顺序。

Note that of the 32 hardware queues only 8 are used by default to save resources. 请注意，在32个硬件队列中，默认情况下仅使用8个队列来节省资源。 Set the CUDA_DEVICE_MAX_CONNECTIONS environment variable to a higher value if you need more. 如果需要更大的值，请将CUDA_DEVICE_MAX_CONNECTIONS环境变量设置为更高的值。