OpenGL glReadPixels Performance

Question

I am trying to implement Auto Exposure for HDR Tone mapping and I am trying to reduce the cost of finding the average brightness of my scene and I've seemed to hit a choke point with glReadPixels . Here is my setup:

1: I create a downsampled FBO to reduce the cost of reading when using glReadPixels using only the GL_RED values and in GL_BYTE format.

private void CreateDownSampleExposure() {
        DownFrameBuffer = glGenFramebuffers();
        DownTexture = GL11.glGenTextures();
        glBindFramebuffer(GL_FRAMEBUFFER, DownFrameBuffer);
        GL11.glBindTexture(GL11.GL_TEXTURE_2D, DownTexture);
        GL11.glTexImage2D(GL11.GL_TEXTURE_2D, 0, GL11.GL_RED, 1600/8, 1200/8,
                0, GL11.GL_RED, GL11.GL_BYTE, (ByteBuffer) null);
        glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0,
                GL11.GL_TEXTURE_2D, DownTexture, 0);
        if (glCheckFramebufferStatus(GL_FRAMEBUFFER) != GL_FRAMEBUFFER_COMPLETE) {
            System.err.println("error");
        } else {
            System.err.println("success");
        }
        GL11.glBindTexture(GL11.GL_TEXTURE_2D, 0);
        glBindFramebuffer(GL_FRAMEBUFFER, 0);

    }

2: Setting up the ByteBuffers and reading the texture of the FBO texture Above.

Setup(){
byte[] testByte = new byte[1600/8*1000/8];
ByteBuffer testByteBuffer = BufferUtils.createByteBuffer(testByte.length);
testByteBuffer.put(testByte);
testByteBuffer.flip();
}
MainLoop(){
  //Render scene and store result into downSampledFBO texture

   GL11.glBindTexture(GL11.GL_TEXTURE_2D, DeferredFBO.getDownTexture());

  //GL11.glGetTexImage(GL11.GL_TEXTURE_2D, 0, GL11.GL_RED, GL11.GL_BYTE,       
  //testByteBuffer); <- This is slower than readPixels. 

  GL11.glReadPixels(0, 0, DisplayManager.Width/8, DisplayManager.Height/8, 
  GL11.GL_RED, GL11.GL_BYTE, testByteBuffer);
  int x = 0;

  for(int i = 0; i <testByteBuffer.capacity(); i++){
        x+= testByteBuffer.get(i);
        }
    System.out.println(x); <-Print out accumulated value of brightness. 
    }
  //Adjust exposure depending on brightness.

The problem is, I can downsample my FBO texture by a factor of 100, so my glReadPixels reads only 16x10 pixels and there is little to no performance gain. There is a substantial performance gain from no downsampling but once I get past around dividing the width and height by 8 it seems to fall off. It seems like there is such a huge overhead of just calling this function. Is there something I am doing incorrectly or not considering when calling glReadPixels ?.

Answer 1

glReadPixels is slow because the CPU must wait until the GPU has finished all of it's rendering before it can give you the results. The dreaded sync point.

One way to make glReadPixels fast is to use some sort of double/triple buffering scheme, so that you only call glReadPixels on render-to-textures that you expect the GPU has already finished with. This is only viable if waiting a couple of frames before receiving the result of glReadPixels is acceptable in your application. For example, in a video game the latency could be justified as a simulation of the pupil's response time to a change in lighting conditions.

However, for your particular tone-mapping example, presumably you want to calculate the average brightness only to feed that information back into the GPU for another rendering pass. Instead of glReadPixels, calculate the average by copying your image to successively half-sized render targets with linear filtering (a box filter), until you're down to a 1x1 target.

That 1x1 target is now a texture containing your average brightness and can use that texture in your tone-mapping rendering pass. No sync points.

OpenGL glReadPixels Performance

Question

1 answers

solution1
2 2015-09-30 07:32:19

OpenGL glReadPixels Performance

Question

1 answers

solution1 2 2015-09-30 07:32:19

solution1
2 2015-09-30 07:32:19