简体繁体 English

在Vulkan中并行执行计算着色器？

[英]Parallel compute shaders execution in Vulkan?

原文 2018-06-19 11:32:53 4 1 c++/ parallel-processing/ glsl/ vulkan/ compute-shader

I have several compute shaders (let's call them compute1 , compute2 and so on) that have several input bindings (defined in shader code as layout (...) readonly buffer ) and several output bindings (defined as layout (...) writeonly buffer ). 我有几个计算着色器（我们称它们为compute1 ， compute2等），它们具有多个输入绑定（在着色器代码中定义为layout (...) readonly buffer ）和几个输出绑定（定义为layout (...) writeonly buffer ）。 I'm binding buffers with data to their descriptor sets and then trying to execute these shaders in parallel . 我将缓冲区与数据绑定到其描述符集，然后尝试并行执行这些着色器。

What I've tried: 我试过的

vkQueueSubmit() with VkSubmitInfo.pCommandBuffers holding several primary command buffers (one per compute shader); vkQueueSubmit()和VkSubmitInfo.pCommandBuffers几个主要命令缓冲区（每个计算着色器一个）；
vkQueueSubmit() with VkSubmitInfo.pCommandBuffers holding one primary command buffer that was recorded using vkCmdExecuteCommands() with pCommandBuffers holding several secondary command buffers (one per compute shader); vkQueueSubmit()与VkSubmitInfo.pCommandBuffers保持这是使用记录的一个主命令缓冲区vkCmdExecuteCommands()与pCommandBuffers保持几个次要的命令缓冲区（每个计算着色器之一）;
Separate vkQueueSubmit() + vkQueueWaitIdle() from different std::thread objects (one per compute shader) - each command buffer is allocated in separate VkCommandPool and is submitting to own VkQueue with own VkFence , main thread is waiting using threads[0].join(); threads[1].join(); 将vkQueueSubmit() + vkQueueWaitIdle()与不同的std::thread对象分开（每个计算着色器一个）-每个命令缓冲区都分配在单独的VkCommandPool并通过自己的VkQueue提交给自己的VkFence ，主线程正在使用threads[0].join(); threads[1].join();等待threads[0].join(); threads[1].join(); threads[0].join(); threads[1].join(); and so on; 等等;
Separate vkQueueSubmit() from different detached std::thread objects (one per compute shader) - each command buffer is allocated in separate VkCommandPool and is submitting to own VkQueue with own VkFence , main thread is waiting using vkWaitForFences() with pFences holding fences that where used in vkQueueSubmit() and with waitAll holding true . 单独vkQueueSubmit()从不同的已分离 std::thread对象（每计算着色器之一） -每个命令缓冲区在单独分配VkCommandPool并且被提交给自己VkQueue与自己VkFence ，主线程使用等待vkWaitForFences()与pFences保持围栏那在vkQueueSubmit()中使用的vkQueueSubmit() ，其中waitAll保持true 。

What I've got: 我所拥有的：

In all cases result time is almost the same (difference is less then 1%) as if calling vkQueueSubmit() + vkQueueWaitIdle() for compute1 , then for compute2 and so on. 在所有情况下，结果时间几乎都是相同的（差异小于1％），就好像是为compute1然后为compute2调用vkQueueSubmit() + vkQueueWaitIdle() compute1 。

I want to bind the same buffers as inputs for several shaders, but according to time the result is the same if each shader is executed with own VkBuffer + VkDeviceMemory objects. 我想将相同的缓冲区绑定为多个着色器的输入，但是根据时间，如果每个着色器都使用自己的VkBuffer + VkDeviceMemory对象执行，则结果是相同的。

So my question is : 所以我的问题是 ：

Is is possible to somehow execute several compute shaders simultaneously, or command buffer parallelism works for graphical shaders only? 是否可以以某种方式同时执行多个计算着色器，或者命令缓冲区并行性仅适用于图形着色器？

Update: Test application was compiled using LunarG Vulkan SDK 1.1.73.0 and running on Windows 10 with NVIDIA GeForce GTX 960. 更新：测试应用程序是使用LunarG Vulkan SDK 1.1.73.0编译的，并在Windows 10和NVIDIA GeForce GTX 960上运行。

1 个解决方案

This depends on the hardware You are executing Your application on. 这取决于您要在其上执行应用程序的硬件。 Hardware exports queues which process submitted commands. 硬件导出队列处理已提交的命令。 Each queue, as name suggests, executes command in order, one after another. 顾名思义，每个队列依次执行命令。 So if You submit multiple command buffers to a single queue, they will be executed in order of their submission. 因此，如果将多个命令缓冲区提交到单个队列，则将按提交顺序执行它们。 Internally, GPU can try to parallelize execution of some parts of the submitted commands (like separate parts of graphics pipeline can be processed at the same time). 在内部，GPU可以尝试并行执行所提交命令的某些部分（例如可以同时处理图形流水线的各个部分）。 But in general, single queue processes commands sequentially and it doesn't matter if You are submitting graphics or compute commands. 但是通常，单个队列按顺序处理命令，无论您提交图形还是计算命令都没有关系。

In order to execute multiple command buffers in parallel, You need to submit them to separate queues. 为了并行执行多个命令缓冲区，您需要将它们提交到单独的队列中。 But hardware must support multiple queues - it must have separate, physical queues in order to be able to process them concurrently. 但是硬件必须支持多个队列-它必须具有单独的物理队列，以便能够同时处理它们。

But, what's more important - I've read that some graphics hardware vendors simulate multiple queues through graphics drivers. 但是，更重要的是-我读过一些图形硬件供应商通过图形驱动程序模拟了多个队列。 In other words - they expose multiple queues in Vulkan, but internally they are processed by a single physical queue and I think that's the case with Your issue here, results of Your experiments would confirm this (though I can't be sure, of course). 换句话说-它们在Vulkan中公开了多个队列，但是在内部它们是由一个物理队列处理的，我认为您的问题就是这种情况，您的实验结果可以证实这一点（尽管我不确定，当然）。