简体   繁体   English

OpenGL性能:VBO / Vertex着色器与glEnableClientState / glVertexPointer和glMultMatrix vs glUniformMatrix

[英]OpenGL performance: VBOs/Vertex shader vs. glEnableClientState/glVertexPointer and glMultMatrix vs glUniformMatrix

Im fairly new to OpenGL. 我是OpenGL的新手。 Ive just started learning about shaders, particularly the vertex and fragment shaders. 我刚开始学习着色器,特别是顶点和片段着色器。 My understanding is that when things are done through the shaders you can gain a pretty significant performance increase, because the shader runs on the GPU. 我的理解是,当通过着色器完成事情时,您可以获得相当显着的性能提升,因为着色器在GPU上运行。

However, I've tried doing some research into this topic and I seem to be finding some mixed opinions on the matter, at least in regards to the vertex shader. 但是,我已经尝试过对这个主题进行一些研究,我似乎在这个问题上找到了一些不同意见,至少在顶点着色器方面。

What is the major difference between rendering an object like below and using calls like glMultMatrixd for my transformations: 呈现像下面这样的对象和使用像glMultMatrixd之类的调用来进行转换之间的主要区别是什么:

    glEnableClientState(GL_VERTEX_ARRAY);
    glEnableClientState(GL_NORMAL_ARRAY);

    glVertexPointer(3, GL_FLOAT, 0, &vertices[0]);
    glNormalPointer(GL_FLOAT, 0, &normals[0]);

    glDrawArrays(GL_TRIANGLES, 0, vertices.size() / 3);

    glDisableClientState(GL_VERTEX_ARRAY);
    glDisableClientState(GL_NORMAL_ARRAY);

vs using a VAO/VBO setup like below where I set my transformation matrices to Uniform variables in the shader and do the transformation there. 使用如下所示的VAO / VBO设置,我将转换矩阵设置为着色器中的Uniform变量并在那里进行转换。

glBindVertexArray(vaoHandle);
glBindBuffer(GL_ARRAY_BUFFER, bufferHandle[0]);


glBufferData(GL_ARRAY_BUFFER, vertices.size() * sizeof(float), vertices.data(), GL_STATIC_DRAW);

glVertexAttribPointer(0, 3, GL_FLOAT, GL_FALSE, 0, 0);
glEnableVertexAttribArray(0);

glBindBuffer(GL_ARRAY_BUFFER, bufferHandle[1]);
glBufferData(GL_ARRAY_BUFFER, normals.size() * sizeof(float), normals.data(), GL_STATIC_DRAW);

glVertexAttribPointer(1, 3, GL_FLOAT, GL_FALSE, 0, 0);
glEnableVertexAttribArray(1);

..... .....

glBindVertexArray(vaoHandle);
glDrawArrays(GL_TRIANGLES, 0, vertices.size() / 3);

Just a heads up...I dont care about whats wrong with the code below. 只是一个抬头...我不关心下面的代码是什么错。 Again, I just wanna know if there is in fact a performance difference and why? 同样,我只想知道实际上是否存在性能差异以及为什么? Whats going on underneath the hood for both these approaches? 对于这两种方法,在引擎盖下是怎么回事? WHy would one be faster/slower then the other? 为什么一个会比另一个更快/更慢? And same goes for the transformations. 同样适用于转型。 Why would doing one in a vertex shader with a uniform be faster then using glMultMatrix? 为什么在使用glMultMatrix的同时使用制服的顶点着色器中做一个?

What the GPU ends up executing is mostly the same for both cases on any GPU that is at least halfway recent. 对于最近至少中途的任何GPU上的两种情况,GPU最终执行的内容大致相同。 I don't think anybody has built GPUs that actually have dedicated hardware for the fixed pipeline in quite some time. 我认为没有人在相当长的一段时间内构建了实际拥有固定管道专用硬件的GPU。 For desktop GPUs, I believe that transition happened about 10+ years ago (for a few years before that, they were already programmable, but also still had fixed function hardware). 对于桌面GPU,我认为过渡发生在大约10年多前(在此之前的几年,它们已经可编程,但仍然具有固定功能硬件)。 For mobile GPUs, the transition to purely programmable GPUs happened later, but also quite some time ago. 对于移动GPU,向纯可编程GPU的过渡发生得晚,但也是很久以前。

If you use the fixed pipeline, the driver generates shader code for you, based on the fixed function state you set. 如果使用固定管道,驱动程序将根据您设置的固定功能状态为您生成着色器代码。 So what you're really comparing are shaders that are compiled from GLSL you pass to the driver, and shaders generated by the driver based on state values. 所以你真正要比较的是从传递给驱动程序的GLSL编译的着色器,以及驱动程序根据状态值生成的着色器。

The shader will obviously run on the GPU in both cases, so there's really not fundamental difference beyond that. 在这两种情况下,着色器显然都会在GPU上运行,所以除此之外真的没有根本区别。

Now, you may ask: Which one is more efficient? 现在,您可能会问:哪一个更有效率? There's no way to tell in general. 一般来说,没有办法说出来。 Some considerations include: 一些考虑包括:

  • Shaders that were generated by the driver for fixed function state can potentially have an advantage because they were heavily tuned, most likely in shader assembly. 由驱动程序为固定功能状态生成的着色器可能具有优势,因为它们经过大量调整,很可能在着色器组装中。 This was primarily done for workstation class GPUs, where a lot of software was using legacy fixed function OpenGL for much longer. 这主要是针对工作站级GPU完成的,其中许多软件使用传统的固定功能OpenGL的时间要长得多。

  • Shaders you write in GLSL have the advantage that they do exactly what you need, and nothing else . 您在GLSL中编写的着色器具有以下优势:它们可以完全满足您的需求,而不是其他任何东西 So in that sense, they may be more streamlined for your precise use case. 因此,从这个意义上讲,它们可能会更加简化您的精确用例。 Of course the corresponding shader generated by the driver from fixed function state could also be highly streamlined, but it's outside of your control. 当然,驱动程序从固定功能状态生成的相应着色器也可以高度简化,但它超出了您的控制范围。 And especially if you care about performance on various platforms, I frankly wouldn't trust all GPU vendors to generate highly efficient shader code for me. 特别是如果你关心各种平台上的性能,我坦率地不相信所有GPU供应商都能为我生成高效的着色器代码。

Of course writing your own shader code has major advantages beyond that. 当然,编写自己的着色器代码除此之外还有其他主要优点。 It allows you to do things that are simply not possible with the fixed pipeline. 它允许您执行固定管道无法实现的操作。 And even where the fixed pipeline can do the job, using shaders is often easier once you get the hang of writing GLSL code. 即使固定管道可以完成这项工作,一旦你开始编写GLSL代码,使用着色器通常会更容易。

The major performance difference does not come from using shaders, but from using VBOs. 主要的性能差异不是来自使用着色器,而是来自使用VBO。

In the first example, vertices and normals reside in client side memory (aka the application memory). 在第一个示例中, verticesnormals驻留在客户端存储器(也称为应用程序存储器)中。 Whenever they are drawn, these arrays are copied to the graphic card, which can take a significant time. 无论何时绘制它们,这些数组都会被复制到图形卡中,这可能需要很长时间。

In contrast to this, the second example stores all relevant values in a VBO which is located in graphics memory. 与此相反,第二个示例将所有相关值存储在位于图形存储器中的VBO中。 Thus the data is already stored in the optimal location and no copying is required for drawing. 因此,数据已经存储在最佳位置,并且不需要复制来进行绘制。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM