GLSL / OpenGL重用顶点着色器的输出

Question

I am rendering sprites in 3d space, where each quad is formed with two triangles. 我正在3D空间中渲染精灵，其中每个四边形由两个三角形组成。 I draw GL_TRIANGLES (see below). 我画了GL_TRIANGLES（见下文）。 Since 2 vertices are repeated in this formation, vertex shader does two times the same computation. 由于在此结构中重复了2个顶点，因此顶点着色器执行两次相同的计算。

    5    3, 4
     *---*
     |  /|
     |/  |
     *---* 
  1, 6    2

I wanted to optimize this by using a geometry shader to repeat the two vertices. 我想通过使用几何着色器重复两个顶点来优化此效果。 The reason for this is that the vertex shader is expensive and there is a high number of triangles in the scene. 这是因为顶点着色器很昂贵，并且场景中存在大量三角形。 After a lot of hackery, I managed to pull it off. 经过大量的黑客攻击后，我设法将其关闭。 It turned off to be very inefficient. 它被关闭是非常低效的。 It is actually 45% slower on my machine. 实际上，这在我的机器上慢了45％。 I assume that this comes from the fact that primitive assembly is performed two times and a lot unnecessary data copying happens in the geometry shader. 我认为这是由于原始装配体执行了两次，并且在几何着色器中发生了很多不必要的数据复制这一事实。 I can't view the assembly code so I can only guess. 我无法查看汇编代码，因此只能猜测。

Now to my question, is there a better way of doing this that would actually be faster than doing all the extra vertex shader operations. 现在我的问题是，有没有比所有其他额外的顶点着色器操作更快的更好方法呢？

Answer 1

Geometry shader is not needed for that. 不需要几何着色器。

What you need is indexed rendering: every vertex is stored in VBO only once . 您需要的是索引渲染：每个顶点仅在VBO中存储一次。 Then, you create additional buffer object (bound with GL_ELEMENT_ARRAY_BUFFER ), that stores indexes of vertices stored in actual VBO. 然后，创建其他缓冲区对象（与GL_ELEMENT_ARRAY_BUFFER绑定），该对象存储实际VBO中存储的顶点索引。

Visualization: (source: in2gpu.com ) 可视化效果：（来源： in2gpu.com ）

Note, that in your case is not that bad. 请注意，你的情况并不坏。 For example, consider drawing a circle: let's say, you draw it using 360 triangles (seems reasonable). 例如，考虑绘制一个圆：假设您使用360个三角形绘制它（看起来很合理）。 In this case, center vertex would duplicated for every triangle - that would cause 359 * 4 (number of components + alignment) * 4 (usual value of sizeof(float) ) = 5744 bytes of unnecessary data: 在这种情况下，每个三角形将复制中心顶点-这将导致359 * 4（组件数+对齐）* 4（ sizeof(float)通常值）= 5744字节的不必要数据：

Further reading: 进一步阅读：

VBO indexing VBO索引
Indexed draws 索引抽奖

UPDATE 更新

Since 2 vertices are repeated in this formation, vertex shader does two times the same computation. 由于在此结构中重复了2个顶点，因此顶点着色器执行两次相同的计算。

No, it surely does not. 不，它肯定不会。 All repeated vertices will definitely hit vertex cache (I guess that is what you meant by "caching"?) and will be reused. 所有重复的顶点肯定会命中顶点缓存（我想这是您所说的“缓存”的意思吗？），并将被重用。 This is a very common usage pattern - remember, that sometimes indexed rendering is not a solution (for example, when you have different attributes for the same position - yes, you can move position data to separate VBO, but it's usually not worth it, so let's leave that), so GPUs must handle such situations efficiently. 这是一种非常常见的用法-请记住，有时索引渲染不是解决方案（例如，当您对同一位置使用不同的属性时-是的，您可以将位置数据移动到单独的VBO中，但这通常是不值得的，因此，我们就不用说了），因此GPU必须有效地处理这种情况。 GPU vendors took care of that. GPU供应商负责此事。

So do not optimize that. 所以不要优化它。 If you are aware of indexed rendering, but you either cannot use it or it does not give any improvement, let GPU hadle rendering the best way possible. 如果您知道索引渲染，但是您不能使用它或没有任何改进，那么让GPU hadle渲染成为可能的最佳方法。

Answer 2

Since 2 vertices are repeated in this formation, vertex shader does two times the same computation. 由于在此结构中重复了2个顶点，因此顶点着色器执行两次相同的计算。

No, on practically all existing implementations (ie GPUs) it does not. 不，在几乎所有现有的实现方式（即GPU）上都没有。

The repeated vertices will hit the vertex cache and the existing results of the previous computation on the very same vertex are just reused for the following steps in the pipeline. 重复的顶点将到达顶点缓存，并且先前在同一顶点上进行的计算的现有结果将重新用于管道中的后续步骤。

Trying to optimize this is a moot point, GPUs have been optimized for exactly that very usage pattern and performance wise that system has been squeezed dry. 试图优化这一点尚无定论，GPU已经针对系统的使用模式和性能进行了严格优化。

GLSL / OpenGL重用顶点着色器的输出

问题描述

2 个解决方案

解决方案1
5 已采纳 2015-09-01 17:55:51

解决方案2
1 2015-09-01 18:17:45

GLSL / OpenGL重用顶点着色器的输出

问题描述

2 个解决方案

解决方案1 5 已采纳 2015-09-01 17:55:51

解决方案2 1 2015-09-01 18:17:45

解决方案1
5 已采纳 2015-09-01 17:55:51

解决方案2
1 2015-09-01 18:17:45