简体   繁体   中英

Android opengl shader program to copy image from camera to SSBO for TF-lite GPU Inference

Tensorflow lite gpu delegate documentation provides a faster method for running tflite inference using Opengl and SSBO in Android[3]. The documentation provides sample code to create and bind a SSBO with a image already in GPU. How can we copy or convert an image from android live camera and copy it to SSBO using OpenGL shader code? When we just dump CPU memory to a SSBO the performance becomes worse compared to the normal gpu delegate execution. So what is the proper or most efficient way to pass camera image to SSBO so as to make the tflite inference faster?

In the following code we have tried to convert the camera frame to bitmap and then convert it to texture and finally copy it to SSBO. However this method is compratively slower than normal GPU delegate execution pipeline (where data is copied from CPU to GPU -overhead). The aim is to reduce the CPU to GPU copying of image data by making the image data availabel in GPU memory and then passing it to the model. We are able to run the model[1] at 40-50 ms using standard GPU delegate inference mechanism; whereas it takes 90-100 ms using the aforesaid SSBO method (unoptimized). The above timing refers to the time for running interpreter.run() method in tensorflow lite. Also it looks like this SSBO mechanism only works with OpenGL ES 3.1 or higher.

The ideal use case (as suggested by tensorflow) is the following[2]:

  1. You get the camera input in the form of a surface texture.
  2. Create an OpenGL shader storage buffer object (SSBO).
  3. Use GPUDelegate.bindGlBufferToTensor() to associate that SSBO with the input tensor.

  4. Write a small shader program to dump surface texture of [1] into that SSBO of [2] efficiently.

  5. Run inference.

We are able to get camera frames as raw bytes or convert it into texture and even render it to GLSurface View. But we are uanble to acheive the speedup as suggetsed by tensorflow.

  1. https://github.com/tensorflow/tensorflow/issues/26297
  2. https://github.com/tensorflow/tensorflow/issues/25657#issuecomment-466489248
  3. https://www.tensorflow.org/lite/performance/gpu_advanced#android_2

Android Code:

public int[] initializeShaderBuffer(){
        android.opengl.EGLContext eglContext = eglGetCurrentContext();
        int[] id = new int[1];
        GLES31.glGenBuffers(id.length, id, 0);
        GLES31.glBindBuffer(GL_SHADER_STORAGE_BUFFER, id[0]);
        GLES31.glBufferData(GL_SHADER_STORAGE_BUFFER, 257*257*3*4, null, GLES31.GL_STREAM_COPY);

        GLES31.glBindBuffer(GL_SHADER_STORAGE_BUFFER, 0);// unbind
        return id;
    }

@Override
    public void onSurfaceCreated(GL10 glUnused, EGLConfig config) {
.....
.....

mTextureDataHandle0 = TextureHelper.loadTexture(mActivityContext,
                R.drawable.srcim);//No error

}


@Override
    public void onDrawFrame(GL10 glUnused) {





        int inputSsboId = initializeShaderBuffer()[0];

        interpreter = new Interpreter(GLActivity.tfliteModel);

        Tensor inputTensor = interpreter.getInputTensor(0);
        GpuDelegate gpuDelegate = new GpuDelegate();
        gpuDelegate.bindGlBufferToTensor(inputTensor, inputSsboId);
        interpreter.modifyGraphWithDelegate(gpuDelegate);



final int computeShaderHandle = ShaderHelper.compileShader(
                GLES31.GL_COMPUTE_SHADER, fragmentShader);//No error
            mProgramHandle = ShaderHelper.createAndLinkProgram(vertexShaderHandle,
                    computeShaderHandle);//No error 

mTextureUniformHandle0 = GLES31.glGetUniformLocation(mProgramHandle,
            "u_Texture0");


/**
         * First texture map
         */
        // Set the active texture0 unit to texture unit 0.
        GLES31.glActiveTexture(GLES31.GL_TEXTURE0 );

        // Bind the texture to this unit.
        GLES31.glBindTexture(GLES31.GL_TEXTURE_2D, mTextureDataHandle0);

        // Tell the texture uniform sampler to use this texture in the shader by
        // binding to texture unit 0.
        GLES31.glUniform1i(mTextureUniformHandle0, 0);


        GLES31.glBindBufferRange(GL_SHADER_STORAGE_BUFFER, 1, inputSsboId, 0, 257*257*3*4);

        GLES31.glUseProgram(mProgramHandle);
        if(compute==1)//Always set to 1
            GLES31.glDispatchCompute(16,16,1);

        GLES31.glBindBuffer(GL_SHADER_STORAGE_BUFFER, 0);  // unbind
        GLES31.glBindTexture(GLES31.GL_TEXTURE_2D, 0);  // unbind


        //Tflite code ...


        byte [][] outputArray = new byte [1][66049];//size based on model output
        Log.d("GPU_CALL_RUN","DONE");
        long oms1=System.currentTimeMillis();
        interpreter.run(null,outputArray);

        long cms1=System.currentTimeMillis();
        Log.d("TIME_RUN_MODEL",""+(cms1-oms1));

        Log.d("OUTVAL", Arrays.deepToString(outputArray));

}

Compute Shader :-

#version 310 es
layout(local_size_x = 16, local_size_y = 16) in;
layout(binding = 0) uniform sampler2D u_Texture0;
layout(std430) buffer;
layout(binding = 1) buffer Output { float elements[]; } output_data;
void main() {
    ivec2 gid = ivec2(gl_GlobalInvocationID.xy);
    //if (gid.x >= 257 || gid.y >= 257) return;
    vec3 pixel = texelFetch(u_Texture0, gid, 0).xyz;
    int linear_index = 3 * (gid.y * 257 + gid.x);
    output_data.elements[linear_index + 0] = pixel.x;
    output_data.elements[linear_index + 1] = pixel.y;
    output_data.elements[linear_index + 2] = pixel.z;
}

There is no simple way to dump SurfaceTexture to SSBO directly. The simplest path would be SurfaceTexture -> GlTexture -> SSBO. TFLite GPU team is also trying to introduce another API (bindGlTextureToTensor), but until that is there, here is a shader program I used for GlTexutre -> SSBO conversion:

    #version 310 es

    layout(local_size_x = 16, local_size_y = 16) in;
    layout(binding = 0) uniform sampler2D input_texture;
    layout(std430) buffer;
    layout(binding = 1) buffer Output { float elements[]; } output_data;

    void main() {
      ivec2 gid = ivec2(gl_GlobalInvocationID.xy);
      if (gid.x >= 224 || gid.y >= 224) return;
      vec3 pixel = texelFetch(input_texture, gid, 0).xyz;
      int linear_index = 3 * (gid.y * 224 + gid.x);
      output_data.elements[linear_index + 0] = pixel.x;
      output_data.elements[linear_index + 1] = pixel.y;
      output_data.elements[linear_index + 2] = pixel.z;
    }

Note that this was for MobileNet v1 of input tensor size 224x224x3.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM