在 Vulkan 中，每个交换链图像、每个帧或每个命令池是否需要专用的栅栏/信号量？

Question

I've read several articles on the CPU-GPU (using fences) and GPU-GPU (using semaphores) synchronization mechanisms, but still got trouble to understand how I should implement a simple render-loop.我已经阅读了几篇关于 CPU-GPU（使用围栏）和 GPU-GPU（使用信号量）同步机制的文章，但仍然无法理解我应该如何实现一个简单的渲染循环。

Please take a look at the simple render() function below.请看下面的简单render() function。 If I got it right, the minimal requirement is that we ensure the GPU-GPU synchronization between vkAcquireNextImageKHR , vkQueueSubmit and vkQueuePresentKHR by a single set of semaphores image_available and rendering_finished as I've done in the example code below.如果我做对了，最低要求是我们通过一组信号量image_available和rendering_finished确保vkAcquireNextImageKHR 、 vkQueueSubmit和vkQueuePresentKHR之间的 GPU-GPU 同步，正如我在下面的示例代码中所做的那样。

However, is this really safe?然而，这真的安全吗？ All operations are asynchronous.所有操作都是异步的。 So, is it really safe to "reuse" the image_available semaphore in a subsequent call of render() again even though the signal request from the previous call hasn't fired yet?那么，即使先前调用的信号请求尚未触发，在随后的render()调用中再次“重用” image_available信号量真的安全吗？ I would think it's not, but, on the other hand, we're using the same queues (don't know if it matters where the graphics and presentation queue are actually the same) and operations inside a queue should be sequentially consumed... But if I got it right, they might not be consumed "as a whole" and could be reordered...我认为不是，但是另一方面，我们使用的是相同的队列（不知道图形和表示队列实际上是否相同是否重要），并且队列内的操作应该按顺序使用.. . 但是，如果我做对了，它们可能不会“作为一个整体”被消耗，并且可以重新排序......

The second thing is that (again, unless I'm missing something) I clearly should use one fence per swap chain image to ensure that the operation on the image corresponding to the image_index of the call to render() has finished.第二件事是（同样，除非我遗漏了什么）我显然应该为每个交换链图像使用一个栅栏，以确保与调用render()的image_index对应的图像上的操作已经完成。 But does that mean that I necessarily need to do a但这是否意味着我一定需要做一个

if (vkWaitForFences(device(), 1, &fence[image_index_of_last_call], VK_FALSE, std::numeric_limits<std::uint64_t>::max()) != VK_SUCCESS)
    throw std::runtime_error("vkWaitForFences");
vkResetFences(device(), 1, &fence[image_index_of_last_call]);

before my call to vkAcquireNextImageKHR ?在我打电话给vkAcquireNextImageKHR之前？ And do I then need dedicated image_available and rendering_finished semaphores per swap chain image?然后我是否需要每个交换链图像专用image_available和rendering_finished信号量？ Or maybe per frame?或者也许每帧？ Or maybe per command buffer/pool?或者也许每个命令缓冲区/池？ I'm really confused...我真的很困惑...

void render()
{
    std::uint32_t image_index;
    switch (vkAcquireNextImageKHR(device(), swap_chain().handle(),
        std::numeric_limits<std::uint64_t>::max(), m_image_available, VK_NULL_HANDLE, &image_index))
    {
    case VK_SUBOPTIMAL_KHR:
    case VK_SUCCESS:
        break;
    case VK_ERROR_OUT_OF_DATE_KHR:
        on_resized();
        return;
    default:
        throw std::runtime_error("vkAcquireNextImageKHR");
    }

    static VkPipelineStageFlags constexpr wait_destination_stage_mask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT;

    VkSubmitInfo submit_info{};
    submit_info.sType = VK_STRUCTURE_TYPE_SUBMIT_INFO;

    submit_info.waitSemaphoreCount = 1;
    submit_info.pWaitSemaphores = &m_image_available;
    submit_info.signalSemaphoreCount = 1;
    submit_info.pSignalSemaphores = &m_rendering_finished;

    submit_info.pWaitDstStageMask = &wait_destination_stage_mask;

    if (vkQueueSubmit(graphics_queue().handle, 1, &submit_info, VK_NULL_HANDLE) != VK_SUCCESS)
        throw std::runtime_error("vkQueueSubmit");

    VkPresentInfoKHR present_info{};
    present_info.sType = VK_STRUCTURE_TYPE_PRESENT_INFO_KHR;

    present_info.waitSemaphoreCount = 1;
    present_info.pWaitSemaphores = &m_rendering_finished;

    present_info.swapchainCount = 1;
    present_info.pSwapchains = &swap_chain().handle();
    present_info.pImageIndices = &image_index;

    switch (vkQueuePresentKHR(presentation_queue().handle, &present_info))
    {
    case VK_SUCCESS:
        break;
    case VK_ERROR_OUT_OF_DATE_KHR:
    case VK_SUBOPTIMAL_KHR:
        on_resized();
        return;
    default:
        throw std::runtime_error("vkQueuePresentKHR");
    }
}

EDIT : As suggested in the answers below, assume we have k "frames in flight" and hence k instances of the semaphores and the fence used in the code above, which I will denote by m_image_available[i] , m_rendering_finished[i] and m_fence[i] for i = 0, ..., k - 1 .编辑：正如下面的答案所建议的，假设我们有k个“飞行中的帧”，因此有k个信号量实例和上面代码中使用的栅栏，我将用m_image_available[i] 、 m_rendering_finished[i]和m_fence[i]对于i = 0, ..., k - 1 。 Let i denote the current index of the frame in flight, which is increased by 1 after each invocation of render() , and j denote the number of invocations of render() , starting from j = 0 .令i表示飞行中帧的当前索引，在每次调用render()后增加1 ， j表示调用render()的次数，从j = 0开始。

Now, assume the swap chain contains three images.现在，假设交换链包含三个图像。

If j = 0 , then i = 0 and the first frame in flight is using swap chain image 0如果j = 0 ，则i = 0并且飞行中的第一帧使用交换链图像0
In the same way, if j = a , then i = a and the a th frame in flight is using swap chain image a , for a= 2, 3同样，如果j = a ，则i = a并且飞行中的第a帧正在使用交换链图像a ，对于a= 2, 3
Now, if j = 3 , then i = 3 , but since the swap chain image only has three images, the fourth frame in flight is using swap chain image 0 again.现在，如果j = 3 ，则i = 3 ，但由于交换链图像只有三个图像，所以飞行中的第四帧再次使用交换链图像0 。 I wonder whether this is problematic or not.我想知道这是否有问题。 I guess it's not, since the wait/signal semaphores m_image_available[3] / m_rendering_finished[3] , used in the calls of vkAcquireNextImageKHR , vkQueueSubmit and vkQueuePresentKHR in this invocation of render() , are dedicated to this particular frame in flight.我猜不是这样，因为在调用render()时调用vkAcquireNextImageKHR 、 vkQueueSubmit和vkQueuePresentKHR中使用的等待/信号量m_image_available[3] / m_rendering_finished[3]专用于飞行中的这个特定帧。
If we reach j = k , then i = 0 again, since there are only k frames in flight.如果我们达到j = k ，那么i = 0再次，因为只有k帧在飞行。 Now we potentially wait at the beginning of render() , if the call to vkQueuePresentKHR from the first invocation ( i = 0 ) of render() hasn't signaled m_fence[0] yet.现在我们可能会在render()的开头等待，如果从第一次调用（ i = 0 ）的render()调用vkQueuePresentKHR还没有发出m_fence[0]的信号。

So, besides my doubts described in the third bullet point above, the only question which remains is why I shouldn't take k as large as possible?所以，除了我在上面第三个要点中描述的怀疑之外，唯一剩下的问题是为什么我不应该尽可能大地取k ？ What I theoretically could imagine is that if we are submitting work to the GPU in a quicker fashion than the GPU is able to consume, the used queue(s) might continually grow and eventually overflow (is there some kind of "max commands in queue" limit?).我理论上可以想象的是，如果我们以比 GPU 能够消耗的速度更快的方式向 GPU 提交工作，则使用的队列可能会不断增长并最终溢出（队列中是否存在某种“最大命令“ 限制？）。

Answer 1

If I got it right, the minimal requirement is that we ensure the GPU-GPU synchronization between vkAcquireNextImageKHR, vkQueueSubmit and vkQueuePresentKHR by a single set of semaphores image_available and rendering_finished as I've done in the example code below.如果我做对了，最低要求是我们通过一组信号量 image_available 和 rendering_finished 确保 vkAcquireNextImageKHR、vkQueueSubmit 和 vkQueuePresentKHR 之间的 GPU-GPU 同步，正如我在下面的示例代码中所做的那样。

Yes, you got it right.是的，你没看错。 You submit the desire to get a new image to render into via vkAcquireNextImageKHR .您通过vkAcquireNextImageKHR提交获取要渲染的新图像的愿望。 The presentation engine will signal the m_image_available semaphore as soon as an image to render into has become available.一旦要渲染的图像可用，表示引擎就会发出m_image_available信号量的信号。 But you have already submitted the instruction.但是您已经提交了指令。

Next, you submit some commands to the graphics queue via submit_info .接下来，您通过submit_info向图形队列提交一些命令。 Ie they are also already submitted to the GPU and wait there until the m_image_available semaphore receives its signal.即它们也已经提交给 GPU 并在那里等待，直到m_image_available信号量接收到它的信号。

Furthermore, a presentation instruction is submitted to the presentation engine that expresses the dependency that it needs to wait until the submit_info -commands have completed by waiting on the m_rendering_finished semaphore.此外，将表示指令提交给表示引擎，该指令表示它需要等待直到submit_info命令通过等待m_rendering_finished信号量完成的依赖关系。

Ie everything has been submitted.即一切都已提交。 If nothing has been signalled yet, everything just sits there in some GPU buffers and waits for signals.如果尚未发出任何信号，则所有内容都位于某些 GPU 缓冲区中并等待信号。

Now, if your code loops right back into the render() function and re-uses the same m_image_available and m_rendering_finished semaphores, it will only work if you are very lucky, namely if all the semaphores have already been signalled before you use them again.现在，如果您的代码直接循环回到render() function 并重新使用相同的m_image_available和m_rendering_finished信号量，它只会在您非常幸运的情况下工作，即如果所有信号量在您再次使用它们之前已经发出信号。

The specifications says the following for vkAcquireNextImageKHR : vkAcquireNextImageKHR的规格说明如下：

If semaphore is not VK_NULL_HANDLE it must not have any uncompleted signal or wait operations pending如果信号量不是 VK_NULL_HANDLE 它不能有任何未完成的信号或等待操作挂起

and furthermore, it says under 7.4.2.此外，它在7.4.2 下说。 Semaphore Waiting 信号量等待

the act of waiting for a binary semaphore also unsignals that semaphore.等待二进制信号量的行为也会取消该信号量的信号。

Ie indeed, you need to wait on the CPU until you know for sure that the previous vkAcquireNextImageKHR that uses the same m_image_available semaphore has completed.即确实，您需要在 CPU上等待，直到您确定之前使用相同vkAcquireNextImageKHR信号量的m_image_available已完成。

And yes, you already got it right: You need to use a fence for that which you pass to vkQueueSubmit .是的，您已经做对了：您需要为传递给vkQueueSubmit的内容使用栅栏。 If you do not synchronize on the CPU, you'll shovel ever more work to the GPU (which is a problem) and the semaphores that you are re-using might not get properly unsignalled in time (which is a problem).如果您不在 CPU 上进行同步，您将在 GPU 上进行更多工作（这是一个问题），并且您正在重复使用的信号量可能无法及时正确地取消信号（这是一个问题）。

What is often done is that the semaphores and fences are multiplied, eg to 3 each, and these sets of synchronization objects are used in sequence, so that more work can be parallelized on the GPU.经常做的是将信号量和栅栏相乘，例如每个成3个，并按顺序使用这些同步对象集，以便在GPU上并行处理更多工作。 The Vulkan Tutorial describes this quite nicely in its Rendering and presentation chapter. Vulkan 教程在其渲染和演示一章中很好地描述了这一点。 It is also explained with animation in this lecture starting at 7:59 . 7:59开始的本次讲座中还使用 animation 进行了解释。

Answer 2

So first of all, as you mentioned correctly, semaphores are strictly for GPU-GPU synchronization, eg to make sure that one batch of commands (one submit) has finished before another one starts.因此，首先，正如您正确提到的，信号量严格用于 GPU-GPU 同步，例如，确保一批命令（一个提交）在另一批命令开始之前完成。 This is here used to synchronize the rendering commands with the present command such that the presenting engine knows when to present the rendered image.这在这里用于将渲染命令与呈现命令同步，以便呈现引擎知道何时呈现呈现的图像。

Fences are the main utility for CPU-GPU synchronization. Fences 是 CPU-GPU 同步的主要工具。 You place a fence in a queue submit and then on the CPU side wait for it before you want to proceed.您在队列提交中放置一个栅栏，然后在 CPU 端等待它，然后再继续。 This is usually done here such that we do not queue any new rendering/present commands while the previous frame hasn't finished.这通常在这里完成，这样我们就不会在前一帧尚未完成时排队任何新的渲染/呈现命令。

But does that mean that I necessarily need to do a但这是否意味着我一定需要做一个

if (vkWaitForFences(device(), 1, &fence[image_index_of_last_call], VK_FALSE, std::numeric_limits<std::uint64_t>::max()) != VK_SUCCESS)
    throw std::runtime_error("vkWaitForFences");
vkResetFences(device(), 1, &fence[image_index_of_last_call]);

before my call to vkAcquireNextImageKHR?在我打电话给 vkAcquireNextImageKHR 之前？

Yes, you definitely need this in your code, otherwise your semaphores would not be safe and you would probably get validation errors.是的，您的代码中肯定需要这个，否则您的信号量将不安全，并且您可能会遇到验证错误。

In general, if you want your CPU to wait until your GPU has finished rendering of the previous frame, you would have only a single fence and a single pair of semaphores.一般来说，如果你想让你的 CPU 等到你的 GPU 完成前一帧的渲染，你将只有一个栅栏和一对信号量。 You could also replace the fence by a waitIdle command of the queue or device.您还可以通过队列或设备的 waitIdle 命令替换栅栏。 However, in practice you do not want to stall the CPU and in the meantime record commands for the next frame.但是，在实践中，您不希望停止 CPU 并同时记录下一帧的命令。 This is done via frames in flight .这是通过飞行中的帧完成的。 This simply means that for every frame in flight (ie number of frames that can be recorded in parallel to the execution on the GPU), you have one fence and one pair of semaphores which synchronize that particular frame.这仅仅意味着对于飞行中的每一帧（即可以与 GPU 上的执行并行记录的帧数），您有一个栅栏和一对同步该特定帧的信号量。

So in essence for your render loop to work properly you need a pair of semaphores + fence per frame in flight, independent of the number of swapchain images.因此，从本质上讲，为了让您的渲染循环正常工作，您需要在每帧飞行中使用一对信号量 + 栅栏，与交换链图像的数量无关。 However, do note that the current frame index (frame in flight) and image index (swapchain) will generally not be the same except you use the same amount of swapchain images as frames in flight.但是，请注意，当前帧索引（飞行中的帧）和图像索引（交换链）通常不会相同，除非您使用与飞行中的帧相同数量的交换链图像。 This is because the presenting engine might give you swapchain images out of order depending on your presenting mode.这是因为呈现引擎可能会根据您的呈现模式为您提供乱序的交换链图像。

在 Vulkan 中，每个交换链图像、每个帧或每个命令池是否需要专用的栅栏/信号量？

问题描述

2 个解决方案

解决方案1
4 2020-11-29 09:38:47

解决方案2
2 2020-11-29 09:37:51

在 Vulkan 中，每个交换链图像、每个帧或每个命令池是否需要专用的栅栏/信号量？

问题描述

2 个解决方案

解决方案1 4 2020-11-29 09:38:47

解决方案2 2 2020-11-29 09:37:51

解决方案1
4 2020-11-29 09:38:47

解决方案2
2 2020-11-29 09:37:51