简体繁体 English

NVIDIA MPS（多进程服务器）和CUDA Streams之间有什么关系？

[英]What is the relationship between NVIDIA MPS (Multi-Process Server) and CUDA Streams?

原文 2018-03-07 23:35:16 7 1 cuda/ gpu/ nvidia/ cuda-streams

Glancing from the official NVIDIA Multi-Process Server docs , it is unclear to me how it interacts with CUDA streams . 从官方的NVIDIA多进程服务器文档中瞥见，我不清楚它是如何与CUDA流交互的 。

Here's an example: 这是一个例子：

App 0: issues kernels to logical stream 0; App 0：向逻辑流0发布内核;

App 1: issues kernels to (its own) logical stream 0. 应用1：向（自己的）逻辑流0发布内核。

In this case, 在这种情况下，

1) Does / how does MPS "hijack" these CUDA calls? 1）MPS是否“如何”劫持这些CUDA呼叫？ Does it have full knowledge of , for each application, what streams are used and what kernels are in which streams? 对于每个应用程序，它是否完全了解使用了哪些流以及哪些内核在哪些流中？

2) Does MPS create its own 2 streams, and place the respective kernels into the right streams? 2）MPS是否创建了自己的2个流，并将相应的内核放入正确的流中？ Or does MPS potentially enable kernel concurrency via mechanisms other than streams? 或者MPS是否可能通过流以外的机制启用内核并发？

If it helps, I'm interested in how MPS work on Volta, but information with respect to older architecture is appreciated as well. 如果它有帮助，我对MPS如何在Volta上工作感兴趣，但是对于旧架构的信息也很受欢迎。

1 个解决方案

A way to think about MPS is that it acts as a funnel for CUDA activity, emanating from multiple processes, to take place on the GPU as if they emanated from a single process. 考虑MPS的一种方式是，它充当CUDA活动的漏斗，从多个流程发出，在GPU上发生，好像它们来自单个流程。 One of the specific benefits of MPS is that it is theoretically possible for kernel concurrency even if the kernels emanate from separate processes. MPS的一个特定好处是，即使内核来自不同的进程，理论上内核并发也是可能的。 The "ordinary" CUDA multi-process execution model would serialize such kernel executions. “普通”CUDA多进程执行模型将序列化此类内核执行。

Since kernel concurrency in a single process implies that the kernels in question are issued to separate streams, it stands to reason that conceptually, MPS is treating the streams from the various client processes as being completely separate. 由于单个进程中的内核并发意味着所讨论的内核被发布到单独的流中，因此从概念上讲，MPS将来自各个客户端进程的流视为完全独立。 Naturally, then, if you profile such a MPS setup, the streams will show up as being separate from each other, whether they are separate streams associated with a single client process, or streams across several client processes. 当然，如果您对这样的MPS设置进行概要分析，则流将显示为彼此分离，无论它们是与单个客户端进程关联的单独流，还是跨多个客户端进程的流。

In the pre-Volta case, MPS did not guarantee process isolation between kernel activity from separate processes. 在前Volta案例中，MPS不保证内核活动与单独进程之间的进程隔离。 In this respect, it was very much like a funnel, taking activity from several processes and issuing it to the GPU as if it were issued from a single process. 在这方面，它非常像一个漏斗，从多个进程中获取活动并将其发布到GPU，就好像它是从单个进程发出的一样。

In the Volta case, activity from separate processes behaves from an execution standpoint (eg concurrency, etc.) as if it were from a single process, but activity from separate processes still carry process isolation (eg independent address spaces). 在Volta案例中，来自不同进程的活动从执行角度（例如并发等）表现得好像来自单个进程，但来自单独进程的活动仍然带有进程隔离（例如独立的地址空间）。

1) Does / how does MPS "hijack" these CUDA calls? 1）MPS是否“如何”劫持这些CUDA呼叫？ Does it have full knowledge of , for each application, what streams are used and what kernels are in which streams? 对于每个应用程序，它是否完全了解使用了哪些流以及哪些内核在哪些流中？

Yes, CUDA MPS understands separate streams from a given process, as well as the activity issued to each, and maintains such stream semantics when issuing work to the GPU. 是的，CUDA MPS了解来自给定进程的单独流以及发布给每个进程的活动，并在向GPU发布工作时维护此类流语义。 The exact details of how CUDA calls are handled by MPS are unpublished, to my knowledge. 据我所知，MPS处理CUDA呼叫的确切细节尚未公布。

2) Does MPS create its own 2 streams, and place the respective kernels into the right streams? 2）MPS是否创建了自己的2个流，并将相应的内核放入正确的流中？ Or does MPS potentially enable kernel concurrency via mechanisms other than streams? 或者MPS是否可能通过流以外的机制启用内核并发？

MPS maintains all stream activity, as well as CUDA stream semantics, across all clients. MPS维护所有客户端的所有流活动以及CUDA流语义。 Activity issued into a particular CUDA stream will be serialized. 发布到特定CUDA流的活动将被序列化。 Activity issued to independent streams may possibly run concurrently. 发布到独立流的活动可能同时运行。 This is true regardless of the origin of the streams in question, be they from one process or several. 无论所讨论的流的来源如何，无论是来自一个或多个流程，都是如此。