简体   繁体   English

VK_DEPENDENCY_BY_REGION_BIT的含义和含义

[英]The meaning and implications of VK_DEPENDENCY_BY_REGION_BIT

An input attachment can be accessed by the subpassLoad GLSL function which samples the input attachment at the current fragment position, ie the interface doesn't provide random access. subpassLoad GLSL function 可以访问输入附件,它在当前片段 position 处对输入附件进行采样,即接口不提供随机访问。 The consequence of this that input attachments cannot be accessed at arbitrary fragment locations.其结果是无法在任意片段位置访问输入附件。

This practically means [ 1 ]:这实际上意味着 [ 1 ]:

If a rendering technique requires reading values outside the current fragment area (which on a tiler would mean accessing rendered data outside the currently-rendering tile), separate render passes must be used.如果渲染技术需要读取当前片段区域之外的值(这在 tiler 上意味着访问当前渲染 tile 之外的渲染数据),则必须使用单独的渲染通道。

Then, about VK_DEPENDENCY_BY_REGION_BIT the specification says [ 2 ]:然后,关于VK_DEPENDENCY_BY_REGION_BIT规范说 [ 2 ]:

If a synchronization command includes a dependencyFlags parameter, and specifies the VK_DEPENDENCY_BY_REGION_BIT flag, then it defines framebuffer-local dependencies for the framebuffer-space pipeline stages in that synchronization command, for all framebuffer regions.如果同步命令包含dependencyFlags 参数,并指定VK_DEPENDENCY_BY_REGION_BIT 标志,则它为所有帧缓冲区区域定义该同步命令中帧缓冲区空间流水线阶段的帧缓冲区本地依赖关系。 If no dependencyFlags parameter is included, or the VK_DEPENDENCY_BY_REGION_BIT flag is not specified, then a framebuffer-global dependency is specified for those stages.如果没有包含 dependencyFlags 参数,或者没有指定 VK_DEPENDENCY_BY_REGION_BIT 标志,则为这些阶段指定帧缓冲区全局依赖项。

Hans-Kristian Arntzen from ARM [ 3 ] suggests that on tiled architectures multi-subpass renderpasses should be used only in conjuction with VK_DEPENDENCY_BY_REGION_BIT :来自 ARM [ 3 ] 的 Hans-Kristian Arntzen 建议在平铺架构上,多子通道渲染通道只能与VK_DEPENDENCY_BY_REGION_BIT结合使用:

Next, we try to merge adjacent render passes together.接下来,我们尝试将相邻的渲染通道合并在一起。 This is particularly important on tile-based renderers.这对于基于 tile 的渲染器尤为重要。 We try to merge passes together if:如果出现以下情况,我们会尝试将通道合并在一起:

  • They are both graphics passes它们都是图形通道
  • They share some color/depth/input attachments他们共享一些颜色/深度/输入附件
  • Not more than one unique depth/stencil attachment exists存在的唯一深度/模板附件不超过一个
  • Their dependencies can be implemented with BY_REGION_BIT, ie no “texture” dependency, which allows sampling for arbitrary locations.它们的依赖关系可以用 BY_REGION_BIT 来实现,即没有“纹理”依赖关系,它允许对任意位置进行采样。

Now the questions are:现在的问题是:

  1. If you cannot access fragments outside of the current fragment location anyway, what is the point of VK_DEPENDENCY_BY_REGION_BIT ?如果无论如何您都无法访问当前片段位置之外的片段,那么VK_DEPENDENCY_BY_REGION_BIT有什么意义?

  2. On tiled architectures does a multi-subpass render pass where subpass dependencies cannot be declared with VK_DEPENDENCY_BY_REGION_BIT provide any performance advantage over functionally equivalent properly-synchronized series of separate single-subpass render passes?在平铺架构上,不能使用VK_DEPENDENCY_BY_REGION_BIT声明子通道依赖关系的多子通道渲染通道与功能等效且正确同步的一系列单独的单通道渲染通道相比,是否提供任何性能优势?

Well, the specification gives one example.好吧,规范给出了一个例子。 If you want to access a sample of the input attachment that is not covered by the fragment, then you have to use framebuffer-global dependency (ie dependencyFlags = 0 , or one of the vendor extension fixes that).如果要访问片段未覆盖的输入附件样本,则必须使用帧缓冲区全局依赖项(即dependencyFlags = 0 ,或供应商扩展修复之一)。

Though the most obvious example are non-attachment resources, which are naturally random access (where you can access any pixel).尽管最明显的例子是非附件资源,它们自然是随机访问的(您可以访问任何像素)。 With VK_DEPENDENCY_BY_REGION_BIT only the part that was written for the same fragment can ever be certain to be visible.使用VK_DEPENDENCY_BY_REGION_BIT ,只有为同一个片段编写的部分才能确定是可见的。 While with framebuffer-global dependency ( dependencyFlags=0 ), you could access a location in a storage buffer written by any fragment shader invocation of the previous subpass.虽然使用帧缓冲区全局依赖项( dependencyFlags=0 ),但您可以访问存储缓冲区中由前一个子通道的任何片段着色器调用写入的位置。

dependencyFlags=0 is sort of a soft-restart of the Render Pass. dependencyFlags=0有点像 Render Pass 的软重启。 So everything being the same I would grade the performance this way:所以一切都是一样的,我会这样评分:
single Subpass ≥ multiple Subpasses with VK_DEPENDENCY_BY_REGION_BIT ≥ multiple subpasses without VK_DEPENDENCY_BY_REGION_BIT ≥ multiple render passes.单个 Subpass ≥ 具有VK_DEPENDENCY_BY_REGION_BIT的多个 Subpass ≥ 没有VK_DEPENDENCY_BY_REGION_BIT ≥ 多个渲染通道。

Whether framebuffer-global subpasses actually provide any performance advantage I cannot say without measurement of a particular implementation (and that would potentially be a perishable information, changing with new GPUs, or perhaps even driver versions).如果不测量特定的实现,我不能说帧缓冲区全局子通道是否真的提供任何性能优势(这可能是一个易腐烂的信息,随着新的 GPU 甚至驱动程序版本的变化而变化)。 Though the case should not be worse than a separate render pass, which would likely be the worst demotion the driver itself would do if it cannot do anything with those special subpasses.虽然这种情况不应该比单独的渲染通道更糟糕,如果它不能对那些特殊的子通道做任何事情,这可能是驱动程序本身会做的最糟糕的降级。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM