简体   繁体   English

使用FBO的OpenGL阴影管线效率

[英]OpenGL shading pipeline efficiency with FBOs

I've been working on implementing deferred shading as I want to have at least 20 lights in my scene. 我一直在努力实现延迟着色,因为我希望场景中至少有20盏灯。 I was having problems making it fast enough (and still am), but then I made a change that I would have thought would make it slower, but in fact almost double my frame rate. 我在使速度足够快(并且仍然是)时遇到问题,但是后来我做了更改,以为会使其变慢,但实际上几乎使我的帧速率翻了一番。

Initial code: 初始代码:

geometryPassFBO = createFBO(); // position texture, normal texture, colour texture and depth buffer
while (1)
{
    bind geometryPassFBO.
    allObjects.draw();

    bind systemFBO();
    for each light
        send light info
        draw light sphere sampling from position, normal and colour textures.

    blit depth buffer from geometryFBO to systemFBO

    for each light
        light.draw(); // draw a cube to represent the light

    2DObjects.draw(); // frame rate, etc...
}

I was in the process of setting up a stencil test to only do the lighting pass if the pixel is set during the geometry pass (ie the background with normal = 0,0,0 and position = 0,0,0 and colour = 0,0,0. 我正在设置模板测试,以便仅在几何遍历期间设置了像素的情况下才进行光照遍历(即法线= 0、0、0,位置= 0、0、0和颜色= 0的背景) ,0,0。

However I was having difficulty copying the combined depth / stencil buffer to the default depth / stencil buffer. 但是我很难将组合的深度/模板缓冲区复制到默认深度/模板缓冲区。 Apparently this doesn't work great, as we don't know what format the system depth / stencil buffer takes. 显然,这行不通,因为我们不知道系统深度/模板缓冲区采用什么格式。 So I had read that it was better to setup another FBO where we can specify the depth / stencil buffer format, render to this, and then either blit or render a screen quad to get it out to the screen. 因此,我读到最好设置另一个FBO,在该FBO中,我们可以指定深度/模板缓冲区格式,然后渲染为该格式,然后blit或渲染一个屏幕四边形以将其显示到屏幕上。

So before adding any stencil stuff, I simply added the new FBO to get that bit working. 因此,在添加任何模板材料之前,我只是添加了新的FBO以使该功能正常工作。

My new code now looks like: 我的新代码现在看起来像:

geometryPassFBO = createGeometryFBO(); // position texture, normal texture, colour texture and depth buffer
lightingPassFBO = createLightingFBO(); // colour texture and depth buffer
while (1)
{
    bind geometryPassFBO.
    allObjects.draw();

    bind lightingPassFBO();
    for each light
        send light info
        draw light sphere sampling from position, normal and colour textures.

    blit depth buffer from geometryFBO to lightingPassFBO

    for each light
        light.draw(); // draw a cube to represent the light

    2DObjects.draw(); // frame rate, etc...

    bind systemFBO;
    render screen quad sampling from colour texture.
}

This works as expected. 这按预期工作。 What was not expected is that my frame rate jumped from 25 FPS to 45 FPS. 出乎意料的是,我的帧速率从25 FPS跃升至45 FPS。

Why is this? 为什么是这样? How can having to do an additional shader pass for a screen quad be more efficient than not doing? 如何为屏幕四边形执行一次额外的着色器传递比不这样做更有效?

Quick follow up question. 快速跟进问题。 Which is more efficient rendering a screen quad using a simple vertex and fragment shader to sample a texture based on gl_FragCoord, or blitting the colour attachment directly to the system FBO? 使用简单的顶点和片段着色器基于gl_FragCoord采样纹理,或将颜色附件直接添加到系统FBO中,哪种方法更有效地渲染屏幕四边形?

Well, it's probably this: 好吧,可能是这样的:

blit depth buffer from geometryFBO to lightingPassFBO 从geometryFBO到lightingPassFBO的blit深度缓冲区

As you point out, format conversion can be slow. 如您所指出的,格式转换可能很慢。 But since you're defining both the input and output buffers for this blit operation, they're probably using the same depth format. 但是,由于您要为此blit操作定义输入和输出缓冲区,因此它们可能使用相同的深度格式。 So the blitting operation may proceed much faster. 因此,发条操作可能会进行得更快。

Also, you probably shouldn't even do this blit at all. 另外,您甚至根本不应该这样做。 Just attach geometryFBO 's depth/stencil buffer to the lightingPassFBO before you render your light cubes. 渲染光立方之前,只需将geometryFBO的深度/模板缓冲区附加到lightingPassFBO Just remember to remove the attachment afterward rendering the lights (otherwise your deferred pass will have undefined behavior, assuming you're reading from the depth buffer in your deferred pass). 只要记住要在渲染灯光后移除附件(否则,假设您正在从延迟传递中的深度缓冲区中进行读取,则延迟传递将具有未定义的行为)。

As for your question about blitting vs. a full-screen quad, I have a better question: why are you accumulating 20+ lights in a scene and not using high-dynamic range lighting? 至于您关于blitting与全屏四边形的问题,我有一个更好的问题:为什么您在一个场景中累积20多个灯光而不使用高动态范围照明? Because the final pass to render to the screen should also use tone-mapping to convert your HDR image to an LDR for display. 因为最终渲染到屏幕的步骤还应该使用色调映射将HDR图像转换为LDR进行显示。

But as for the exact question, a blit operation should be no slower than an FSQ, assuming that there is no format conversion going on. 但是对于确切的问题, 假定没有格式转换正在进行,blit操作不应比FSQ慢。 If there is format conversion happening, then it could be more efficient to take things to a vertex shader. 如果正在进行格式转换,则将其带入顶点着色器可能会更有效。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM