Is this a practical and enough performant shader for doing blur on mobile device

Question

I am trying to implement Blur effect in my game on mobile devices using GLSL shader. I don't have any former experience with writing shaders. And I don't understand if my shader is enough good. Actually I have copyied the GLSL code from a tutorial and I don't know it this tutorial is for vivid demo or also can be used in practice. Here is the code of two pass blur shader that uses Gaussian weights ( http://www.cocos2d-x.org/wiki/User_Tutorial-RenderTexture_Plus_Blur ):

#ifdef GL_ES                                                                      
precision mediump float;
#endif                                                                            

varying vec4 v_fragmentColor;                                                     
varying vec2 v_texCoord;                                                          

uniform vec2 pixelSize;
uniform vec2 direction;
uniform int radius;
uniform float weights[64];

void main() 
{
    gl_FragColor = texture2D(CC_Texture0, v_texCoord)*weights[0];
    for (int i = 1; i < radius; i++) {
        vec2 offset = vec2(float(i)*pixelSize.x*direction.x, float(i)*pixelSize.y*direction.y);
        gl_FragColor += texture2D(CC_Texture0, v_texCoord + offset)*weights[i];
        gl_FragColor += texture2D(CC_Texture0, v_texCoord - offset)*weights[i];
    }
}

I run this shader on each frame update (60 times in a sec) and my game framerate for only one pass drops down to 22 FPS on iPhone 5S (not a bad device). I think this is very-very strange. it seems it has not to much instruction. Why this is so heavy?

PS Blur radius is 50, step is 1.

Answer 1

Main reasons why your shader is heavy:

1: This two calculations: v_texCoord + offset and v_texCoord - offset . because the uv coordonates are computed in the fragment shader the texture data has to be loaded from memory on the spot causing cache miss.

What is a dependent texture read?

2: radius is way to large.

How to make it faster/better:

1: Calculate as much as possible in the vertex shader. Ideally if you calculate all the UV's in the vertex shader the GPU can move the texture memory in cache before calling fragment shaders, drastically improving performance.

2: reduce Radius to accommodate let's say 8-16 texture2D calls. This will probably not give you the result you are expecting, and to solve this you can have 2 textures, blurring texture A into B , then blur again B into texture A and so on, as mush as you need. This will give very good results, i remember crisys 1 used it for motion blur , but i can't find the paper.

3: eliminate those 64 uniforms, have all the data hardcoded in the shader. I know that this is not that nice but you will gain some extra performance.

4: If you carefully calculate the UV coordinates you can take great advantage of texture interpolation. Basically never sample a pixel on it's center, always sample in between pixels and the hardware will do and avrage of the near 4 pixels:

https://en.wikipedia.org/wiki/Bilinear_filtering

5: This line: precision mediump float; does everything have to be mediump ? I would suggest to remove it and do some testing with lowp on as much as you can.

Edit: For you shader, here is a simplified version of what you need to do:

Vertex shader:

attribute highp vec4 Position;
attribute mediump vec2 texture0UV;

varying mediump vec2 v_texCoord0;
varying mediump vec2 v_texCoord1;
varying mediump vec2 v_texCoord2;
varying mediump vec2 v_texCoord3;
varying mediump vec2 v_texCoord5;

uniform mediump vec2 texture_size;

void main() 
{
    gl_Position = Position;
    vec2 pixel_size = vec2(1.0) / texture_size;
    vec2 offset;


    v_texCoord0 = texture0UV;
    v_texCoord1 = texture0UV + vec2(-1.0,0.0) / texture_size + pixel_size * 0.5;
    v_texCoord2 = texture0UV + vec2(0.0,-1.0) / texture_size + pixel_size * 0.5;
    v_texCoord3 = texture0UV + vec2(1.0,0.0) / texture_size  - pixel_size * 0.5;
    v_texCoord4 = texture0UV + vec2(0.0,1.0) / texture_size  - pixel_size * 0.5;
}

The last operation pixel_size * 0.5 is required to take maximum advantage of linear interpolation. In this example the position you pick for sampling are trivial but there is an entire discussion on how you should pick your sampling positions that is way out of the scope of this question.

Fragment shader:

varying mediump vec2 v_texCoord0;
varying mediump vec2 v_texCoord1;
varying mediump vec2 v_texCoord2;
varying mediump vec2 v_texCoord3;
varying mediump vec2 v_texCoord5;

uniform lowp sampler2D CC_Texture0;

void main() 
{
    mediump vec4 final_color = vec4(0.0);

    final_color += texture2D(CC_Texture0,v_texCoord0);
    final_color += texture2D(CC_Texture0,v_texCoord1);
    final_color += texture2D(CC_Texture0,v_texCoord2);
    final_color += texture2D(CC_Texture0,v_texCoord3);
    final_color += texture2D(CC_Texture0,v_texCoord4);

    gl_FragColor = final_color / 5.0;//weights have to go, use fixed values instead, in this case it's 1/5 for each sample
}

For this to look good you need to blur the texture multiple times, even if you blur the texture 2 times you should see a notable difference.

Answer 2

To speed up you can:

Make radius a const to allow shader compiler to unroll the loop
Precompute pixelSize * direction
Decrease radius, I think 50 is too big for mobile device

Is this a practical and enough performant shader for doing blur on mobile device

Question

2 answers

solution1
4 ACCPTED 2015-06-26 18:02:15

solution2
2 2015-06-26 17:38:18

Is this a practical and enough performant shader for doing blur on mobile device

Question

2 answers

solution1 4 ACCPTED 2015-06-26 18:02:15

solution2 2 2015-06-26 17:38:18

solution1
4 ACCPTED 2015-06-26 18:02:15

solution2
2 2015-06-26 17:38:18