简体   繁体   中英

Tiled rendering compute shader light culling and shading

I'm trying to implement tiled-deferred rendering in OpenGL/GLSL and I'm stuck on light culling.

My GPU is kind of older (AMD Radeon 6490m) and for strange reasons compute shaders runs in infinite cycle when atomic operations are called inside them on shared variables so I couldn't compute minimum and maximum depth using compute shaders. Anyway, it isn't much time-consuming operation so I do it in fragment shader.

Then for every visible point light (in view space) I compute screen space bounding quad. Now I want to use single compute shader for light culling and shading. Problem is that as mentioned above, I'm not able to use atomic operations on shared variables and hence I can't build tile light list and store light count for tile.

Problem is I cant' find any other way how to do this.Any idea how to cull & build tile light lists using non-atomics?

Here is pseudo code of my compute shader:

#version 430

#define MAX_LIGHTS  1024
#define TILE_SIZE   32
#define RX  1280
#define RY  720

struct Light {
    vec4 position;
    vec4 quad;
    vec3 color;
    float radius;
}

uint getTilesXCount(){
    return uint(( RX + TILE_SIZE - 1) / TILE_SIZE);
}

uint getTilesYCount(){
    return uint((RY + TILE_SIZE - 1) / TILE_SIZE);
}

layout (binding = 0, rgba16f) uniform readonly image2D minMaxTex;
layout (binding = 1, rgba16f) uniform readonly image2D diffTex;
layout (binding = 2, rgba16f) uniform readonly image2D specTex;

layout (std430, binding = 3) buffer pointLights {
    Light Lights[];
};


//tile light list & light count
shared uint lightIDs[MAX_LIGHTS];
shared uint lightCount = 0;

uniform uint totalLightCount;

layout (local_size_x = TILE_SIZE, local_size_y = TILE_SIZE) in;

void main(void){

        ivec2 pixel = ivec2(gl_GlobalInvocationID.xy);
        vec2 tile = vec2(gl_WorkGroupID.xy * gl_WorkGroupSize.xy) / vec2(1280, 720);

        //get minimum & maximum depth for tile
        vec2 minMax = imageLoad(minMax, tile).xy;

        uint threadCount = TILE_SIZE * TILE_SIZE;
        uint passCount = (totalLightCount + threadCount - 1) / threadCount; 

        for(uint i = 0; i < passCount; i++){

            uint lightIndex = passIt * threadCount + gl_LocalInvocationIndex;

            // prevent overrun by clamping to a last ”null” light
            lightIndex = min(lightIndex, numActiveLights);

            Light l = pointLights[lightIndex];

            if(testLightBounds(pixel, l.quad)){

                if ((minMax.y < (l.position.z + l.radius))
                    && 
                    (minMax.x > (l.position.z - l.radius))){


                    uint index;
                    index = atomicAdd(lightCount, 1);
                    pointLightIndex[index] = lightIndex;
                }
            }
        }

    barrier();

    //do lighting for actual tile
    color = doLight();

    imageStore(out, pos, color);
}

I haven't really implemented tiled deferred, but I think you can approach this in a way similar to building a particle neighboring list for a simulation.

  • Have your compute shader build a tuple containing the light and cell id and store it in a buffer using the current thread as index.
  • Sort that buffer by cell id using your favourite GPU algorithm (radix sort or bitonic sort).
  • Once your buffer is sorted, build an histogram and do a prefix sum scan in order to find where each of the cells start within the buffer.

Ex.

 (Cell, Light) 1st pass: Cell Buffer -> [ 23, 0 ] [ 7, 1 ] [ 9, 2 ] .... 2nd pass: Cell Buffer -> [ 7, 1 ] [ 9, 2 ] [ 23, 0 ] .... (Start, End) 3rd pass: Index Buffer -> [0 0] [0 0] [0 0] [0 0] [0 0] [0 0] [0 1] [1 1] [1 2] ... 

For more details, the method is described in Simon Green's "Particle simulation using CUDA": http://idav.ucdavis.edu/~dfalcant/downloads/dissertation.pdf

The original method assumes that a particle can only be placed within a single cell, but you should be able to workaround this easily by using a bigger workload.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM