简体繁体 English

曾几何时，当 > 比 < 快时……等等，什么？

[英]Once upon a time, when > was faster than < … Wait, what?

原文 2011-09-07 18:39:29 6 4 c/ optimization/ opengl/ cpu/ gpu

I am reading an awesome OpenGL tutorial .我正在阅读一个很棒的 OpenGL 教程。 It's really great, trust me.真的很棒，相信我。 The topic I am currently at is Z-buffer.我目前的主题是 Z-buffer。 Aside from explaining what's it all about, the author mentions that we can perform custom depth tests, such as GL_LESS, GL_ALWAYS, etc. He also explains that the actual meaning of depth values (which is top and which isn't) can also be customized.除了解释它的全部内容之外，作者提到我们可以执行自定义深度测试，例如 GL_LESS、GL_ALWAYS 等。他还解释了深度值的实际含义（哪个是顶部，哪个不是）也可以是定制。 I understand so far.到目前为止我明白了。 And then the author says something unbelievable:然后作者说了一些令人难以置信的话：

The range zNear can be greater than the range zFar; zNear 的范围可以大于 zFar 的范围； if it is, then the window-space values will be reversed, in terms of what constitutes closest or farthest from the viewer.如果是，则窗口空间值将根据离观察者最近或最远的构成而反转。

Earlier, it was said that the window-space Z value of 0 is closest and 1 is farthest.早些时候，有人说窗口空间 Z 值 0 最接近，1 最远。 However, if our clip-space Z values were negated, the depth of 1 would be closest to the view and the depth of 0 would be farthest.然而，如果我们的裁剪空间 Z 值被否定，深度 1 将最接近视图，深度 0 将最远。 Yet, if we flip the direction of the depth test (GL_LESS to GL_GREATER, etc), we get the exact same result.然而，如果我们翻转深度测试的方向（GL_LESS 到 GL_GREATER 等），我们会得到完全相同的结果。 So it's really just a convention.所以这实际上只是一个约定。 Indeed, flipping the sign of Z and the depth test was once a vital performance optimization for many games.事实上，翻转 Z 的符号和深度测试曾经是许多游戏的重要性能优化。

If I understand correctly, performance-wise, flipping the sign of Z and the depth test is nothing but changing a < comparison to a > comparison.如果我理解正确，在性能方面，翻转 Z 的符号和深度测试只不过是将<比较更改为>比较。 So, if I understand correctly and the author isn't lying or making things up, then changing < to > used to be a vital optimization for many games.因此，如果我理解正确并且作者没有说谎或编造，那么将<更改为>曾经是许多游戏的重要优化。

Is the author making things up, am I misunderstanding something, or is it indeed the case that once < was slower ( vitally , as the author says) than > ?是作者胡编，我误解的东西，或者是它的确是这样的，一旦<被较慢（至关重要，正如作者说）比> ？

Thanks for clarifying this quite curious matter!感谢您澄清这个非常奇怪的问题！

_{Disclaimer: I am fully aware that algorithm complexity is the primary source for optimizations.}_{免责声明：我完全意识到算法复杂性是优化的主要来源。} _{Furthermore, I suspect that nowadays it definitely wouldn't make any difference and I am not asking this to optimize anything.}_{此外，我怀疑现在它肯定不会有任何区别，而且我不是要求它优化任何东西。} _{I am just extremely, painfully, maybe prohibitively curious.}_{我只是非常，痛苦，也许非常好奇。}

4 个解决方案

If I understand correctly, performance-wise, flipping the sign of Z and the depth test is nothing but changing a < comparison to a > comparison.如果我理解正确，在性能方面，翻转 Z 的符号和深度测试只不过是将 < 比较更改为 > 比较。 So, if I understand correctly and the author isn't lying or making things up, then changing < to > used to be a vital optimization for many games.因此，如果我理解正确并且作者没有说谎或编造，那么将 < 更改为 > 曾经是许多游戏的重要优化。

I didn't explain that particularly well, because it wasn't important.我没有解释得特别好，因为它并不重要。 I just felt it was an interesting bit of trivia to add.我只是觉得添加一些有趣的琐事。 I didn't intend to go over the algorithm specifically.我不打算专门研究算法。

However, context is key.然而，上下文是关键。 I never said that a < comparison was faster than a > comparison.我从来没有说过 < 比较比 > 比较快。 Remember: we're talking about graphics hardware depth tests, not your CPU.请记住：我们谈论的是图形硬件深度测试，而不是您的 CPU。 Not operator< .不是operator< 。

What I was referring to was a specific old optimization where one frame you would use GL_LESS with a range of [0, 0.5].我指的是一个特定的旧优化，其中一帧您将使用范围为 [0, 0.5] 的GL_LESS 。 Next frame, you render with GL_GREATER with a range of [1.0, 0.5].下一帧，您使用范围为 [1.0, 0.5] 的GL_GREATER进行渲染。 You go back and forth, literally "flipping the sign of Z and the depth test" every frame.你来回走动，字面意思是“翻转 Z 的符号和深度测试”每一帧。

This loses one bit of depth precision, but you didn't have to clear the depth buffer, which once upon a time was a rather slow operation.这会损失一点深度精度，但您不必清除深度缓冲区，这曾经是一个相当缓慢的操作。 Since depth clearing is not only free these days but actually faster than this technique, people don't do it anymore.由于深度清除现在不仅免费而且实际上比这种技术更快，因此人们不再这样做了。

The answer is almost certainly that for whatever incarnation of chip+driver was used, the Hierarchical Z only worked in the one direction - this was a fairly common issue back in the day.答案几乎可以肯定，无论使用哪种芯片+驱动程序，分层 Z 都只在一个方向上起作用——这在当时是一个相当普遍的问题。 Low level assembly/branching has nothing to do with it - Z-buffering is done in fixed function hardware, and is pipelined - there is no speculation and hence, no branch prediction.低级汇编/分支与它无关 - Z 缓冲在固定功能硬件中完成，并且是流水线的 - 没有推测，因此没有分支预测。

Optimization like that will hurt performance on many embedded graphics solutions because it will make framebuffer resolve less efficient.像这样的优化会损害许多嵌入式图形解决方案的性能，因为它会使帧缓冲区解析效率降低。 Clearing a buffer is a clear signal to the driver that it does not need to store and restore the buffer when binning.清除缓冲区是对驱动程序的一个清除信号，表明它在合并时不需要存储和恢复缓冲区。

Little background information: a tiling/binning rasterizer processes the screen in number of very small tiles which fit into the on-chip memory.很少的背景信息：平铺/合并光栅化器以适合片上存储器的非常小的平铺数量处理屏幕。 This reduces writes and reads to external memory which reduces traffic on memory bus.这减少了对外部存储器的写入和读取，从而减少了存储器总线上的流量。 When a frame is complete (swap is called, or FIFOs are flushed because they are full, framebuffer bindings change, etc) the framebuffer must be resolved;当一个帧完成时（调用交换，或者 FIFO 被刷新，因为它们已满，帧缓冲区绑定改变等）必须解析帧缓冲区； this means every bin is processed in turn.这意味着依次处理每个 bin。

The driver must assume that the previous contents must be preserved.驱动程序必须假定必须保留先前的内容。 The preservation means that the bin has to be written out to the external memory and later restored from external memory when the bin is processed again.保存意味着bin 必须被写出到外部存储器，然后在再次处理bin 时从外部存储器恢复。 The clear operation tells the driver that the contents of the bin are well defined: the clear color. clear 操作告诉驱动程序 bin 的内容是明确定义的：clear 颜色。 This is a situation which is trivial to optimize.这是一种很容易优化的情况。 There are also extensions to "discard" the buffer contents.还有一些扩展可以“丢弃”缓冲区内容。