[英]Android opengl shader program to copy image from camera to SSBO for TF-lite GPU Inference
Tensorflow lite gpu delegate documentation provides a faster method for running tflite inference using Opengl and SSBO in Android[3]. Tensorflow lite gpu委托文档提供了一种更快的方法,用于在Android [3]中使用Opengl和SSBO运行tflite推理。 The documentation provides sample code to create and bind a SSBO with a image already in GPU.
该文档提供了示例代码,用于创建SSBO并将其与GPU中已有的图像绑定。 How can we copy or convert an image from android live camera and copy it to SSBO using OpenGL shader code?
我们如何从android实时相机复制或转换图像,然后使用OpenGL着色器代码将其复制到SSBO? When we just dump CPU memory to a SSBO the performance becomes worse compared to the normal gpu delegate execution.
当我们仅将CPU内存转储到SSBO时,与正常的gpu委托执行相比,性能会变差。 So what is the proper or most efficient way to pass camera image to SSBO so as to make the tflite inference faster?
那么,将相机图像传递到SSBO以便更快地进行tflite推理的正确或最有效方法是什么?
In the following code we have tried to convert the camera frame to bitmap and then convert it to texture and finally copy it to SSBO. 在以下代码中,我们尝试将相机框架转换为位图,然后将其转换为纹理,最后将其复制到SSBO。 However this method is compratively slower than normal GPU delegate execution pipeline (where data is copied from CPU to GPU -overhead).
但是,此方法比正常的GPU委托执行管道(将数据从CPU复制到GPU的开销)要慢得多。 The aim is to reduce the CPU to GPU copying of image data by making the image data availabel in GPU memory and then passing it to the model.
目的是通过使图像数据在GPU内存中可用,然后将其传递给模型来减少图像数据从CPU到GPU的复制。 We are able to run the model[1] at 40-50 ms using standard GPU delegate inference mechanism;
使用标准的GPU委托推理机制,我们可以在40-50毫秒的时间内运行模型[1]; whereas it takes 90-100 ms using the aforesaid SSBO method (unoptimized).
而使用上述SSBO方法需要90-100毫秒(未优化)。 The above timing refers to the time for running
interpreter.run()
method in tensorflow lite. 以上时间是指在tensorflow lite中运行
interpreter.run()
方法的时间。 Also it looks like this SSBO mechanism only works with OpenGL ES 3.1 or higher. 而且看起来这种SSBO机制仅适用于OpenGL ES 3.1或更高版本。
The ideal use case (as suggested by tensorflow) is the following[2]: 理想的用例(由tensorflow建议)如下[2]:
Use GPUDelegate.bindGlBufferToTensor()
to associate that SSBO with the input tensor. 使用
GPUDelegate.bindGlBufferToTensor()
将该SSBO与输入张量关联。
Write a small shader program to dump surface texture of [1] into that SSBO of [2] efficiently. 编写一个小的着色器程序,将[1]的表面纹理有效地转储到[2]的SSBO中。
Run inference. 运行推断。
We are able to get camera frames as raw bytes or convert it into texture and even render it to GLSurface View. 我们能够以原始字节的形式获取相机帧,或者将其转换为纹理,甚至将其渲染到GLSurface视图。 But we are uanble to acheive the speedup as suggetsed by tensorflow.
但是,我们能够实现张量流建议的加速。
Android Code: Android代码:
public int[] initializeShaderBuffer(){
android.opengl.EGLContext eglContext = eglGetCurrentContext();
int[] id = new int[1];
GLES31.glGenBuffers(id.length, id, 0);
GLES31.glBindBuffer(GL_SHADER_STORAGE_BUFFER, id[0]);
GLES31.glBufferData(GL_SHADER_STORAGE_BUFFER, 257*257*3*4, null, GLES31.GL_STREAM_COPY);
GLES31.glBindBuffer(GL_SHADER_STORAGE_BUFFER, 0);// unbind
return id;
}
@Override
public void onSurfaceCreated(GL10 glUnused, EGLConfig config) {
.....
.....
mTextureDataHandle0 = TextureHelper.loadTexture(mActivityContext,
R.drawable.srcim);//No error
}
@Override
public void onDrawFrame(GL10 glUnused) {
int inputSsboId = initializeShaderBuffer()[0];
interpreter = new Interpreter(GLActivity.tfliteModel);
Tensor inputTensor = interpreter.getInputTensor(0);
GpuDelegate gpuDelegate = new GpuDelegate();
gpuDelegate.bindGlBufferToTensor(inputTensor, inputSsboId);
interpreter.modifyGraphWithDelegate(gpuDelegate);
final int computeShaderHandle = ShaderHelper.compileShader(
GLES31.GL_COMPUTE_SHADER, fragmentShader);//No error
mProgramHandle = ShaderHelper.createAndLinkProgram(vertexShaderHandle,
computeShaderHandle);//No error
mTextureUniformHandle0 = GLES31.glGetUniformLocation(mProgramHandle,
"u_Texture0");
/**
* First texture map
*/
// Set the active texture0 unit to texture unit 0.
GLES31.glActiveTexture(GLES31.GL_TEXTURE0 );
// Bind the texture to this unit.
GLES31.glBindTexture(GLES31.GL_TEXTURE_2D, mTextureDataHandle0);
// Tell the texture uniform sampler to use this texture in the shader by
// binding to texture unit 0.
GLES31.glUniform1i(mTextureUniformHandle0, 0);
GLES31.glBindBufferRange(GL_SHADER_STORAGE_BUFFER, 1, inputSsboId, 0, 257*257*3*4);
GLES31.glUseProgram(mProgramHandle);
if(compute==1)//Always set to 1
GLES31.glDispatchCompute(16,16,1);
GLES31.glBindBuffer(GL_SHADER_STORAGE_BUFFER, 0); // unbind
GLES31.glBindTexture(GLES31.GL_TEXTURE_2D, 0); // unbind
//Tflite code ...
byte [][] outputArray = new byte [1][66049];//size based on model output
Log.d("GPU_CALL_RUN","DONE");
long oms1=System.currentTimeMillis();
interpreter.run(null,outputArray);
long cms1=System.currentTimeMillis();
Log.d("TIME_RUN_MODEL",""+(cms1-oms1));
Log.d("OUTVAL", Arrays.deepToString(outputArray));
}
Compute Shader :- 计算着色器:-
#version 310 es
layout(local_size_x = 16, local_size_y = 16) in;
layout(binding = 0) uniform sampler2D u_Texture0;
layout(std430) buffer;
layout(binding = 1) buffer Output { float elements[]; } output_data;
void main() {
ivec2 gid = ivec2(gl_GlobalInvocationID.xy);
//if (gid.x >= 257 || gid.y >= 257) return;
vec3 pixel = texelFetch(u_Texture0, gid, 0).xyz;
int linear_index = 3 * (gid.y * 257 + gid.x);
output_data.elements[linear_index + 0] = pixel.x;
output_data.elements[linear_index + 1] = pixel.y;
output_data.elements[linear_index + 2] = pixel.z;
}
There is no simple way to dump SurfaceTexture to SSBO directly. 没有简单的方法可以直接将SurfaceTexture转储到SSBO。 The simplest path would be SurfaceTexture -> GlTexture -> SSBO.
最简单的路径是SurfaceTexture-> GlTexture-> SSBO。 TFLite GPU team is also trying to introduce another API (bindGlTextureToTensor), but until that is there, here is a shader program I used for GlTexutre -> SSBO conversion:
TFLite GPU团队也在尝试引入另一个API(bindGlTextureToTensor),但是直到那为止,这里是我用于GlTexutre-> SSBO转换的着色器程序:
#version 310 es
layout(local_size_x = 16, local_size_y = 16) in;
layout(binding = 0) uniform sampler2D input_texture;
layout(std430) buffer;
layout(binding = 1) buffer Output { float elements[]; } output_data;
void main() {
ivec2 gid = ivec2(gl_GlobalInvocationID.xy);
if (gid.x >= 224 || gid.y >= 224) return;
vec3 pixel = texelFetch(input_texture, gid, 0).xyz;
int linear_index = 3 * (gid.y * 224 + gid.x);
output_data.elements[linear_index + 0] = pixel.x;
output_data.elements[linear_index + 1] = pixel.y;
output_data.elements[linear_index + 2] = pixel.z;
}
Note that this was for MobileNet v1 of input tensor size 224x224x3. 请注意,这是针对输入张量大小为224x224x3的MobileNet v1。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.