简体繁体 English

GPU中的图像强度总和

[英]Sum image intensities in GPU

原文 2010-09-15 22:26:11 1 3 opengl/ image-processing/ gpu/ glsl

I have an application where I need take the average intensity of an image for around 1 million images. 我有一个应用程序，我需要采取图像的平均强度大约100万图像。 It "feels" like a job for a GPU fragment shader, but fragment shaders are for per-pixel local computations, while image averaging is a global operation. 它“感觉”就像GPU片段着色器的工作，但片段着色器用于每像素本地计算，而图像平均是全局操作。

One approach I considered is loading the image into a texture, applying a 2x2 box-blur, load the result back into a N/2 x N/2 texture and repeating until the output is 1x1. 我考虑的一种方法是将图像加载到纹理中，应用2x2框模糊，将结果加载回N / 2 x N / 2纹理并重复直到输出为1x1。 However, this would take log n applications of the shader. 但是，这将采用着色器的log n应用程序。

Is there a way to do it in one pass? 有没有办法一次性完成？ Or should I just break down and use CUDA/OpenCL? 或者我应该分解并使用CUDA / OpenCL？

3 个解决方案

The summation operation is a specific case of the "reduction," a standard operation in CUDA and OpenCL libraries. 求和操作是“简化”的特定情况，是CUDA和OpenCL库中的标准操作。 A nice writeup on it is available on the cuda demos page . cuda演示页面上有一篇很好的文章。 In CUDA, Thrust and CUDPP are just two examples of libraries that provide reduction. 在CUDA中， Thrust和CUDPP只是提供减少的两个库的例子。 I'm less familiar with OpenCL, but CLPP seems to be a good library that provides reduction. 我对OpenCL不太熟悉，但CLPP似乎是一个提供减少的好库。 Just copy your color buffer to an OpenGL pixel buffer object and use the appropriate OpenGL interoperability call to make that pixel buffer's memory accessible in CUDA/OpenCL. 只需将颜色缓冲区复制到OpenGL像素缓冲区对象，并使用适当的OpenGL互操作性调用，以便在CUDA / OpenCL中访问该像素缓冲区的内存。

If it must be done using the opengl API (as the original question required), the solution is to render to a texture, create a mipmap of the texture, and read in the 1x1 texture. 如果必须使用opengl API（需要原始问题）来完成，解决方案是渲染到纹理，创建纹理的mipmap，并读入1x1纹理。 You have to set the filtering right (bilinear is appropriate, I think), but it should get close to the right answer, modulo precision error. 你必须设置正确的过滤（双线性是合适的，我认为），但它应该接近正确的答案，模数精度错误。

My gut tells me to attempt your implementation in OpenCL. 我的直觉告诉我尝试在OpenCL中实现。 You can optimize for your image size and graphics hardware by breaking up the images into bespoke chunks of data that are then summed in parallel. 您可以通过将图像分解为定制的数据块然后并行求和来优化图像大小和图形硬件。 Could be very fast indeed. 可能非常快。

Fragment shaders are great for convolutions but that result is usually written to the gl_FragColor so it makes sense. 片段着色器非常适合卷积，但结果通常写入gl_FragColor，因此它是有意义的。 Ultimately you will have to loop over every pixel in the texture and sum the result which is then read back in the main program. 最终，您必须遍历纹理中的每个像素并对结果求和，然后在主程序中读回。 Generating image statistics perhaps not what the fragment shader was designed for and its not clear that a major performance gain is to be had since its not guaranteed a particular buffer is located in GPU memory. 生成图像统计信息可能不是片段着色器的设计目的，并且不清楚是否要获得主要的性能增益，因为它不能保证特定的缓冲区位于GPU内存中。

It sounds like you may be applying this algorithm to a real-time motion detection scenario, or some other automated feature detection application. 听起来您可能正在将此算法应用于实时运动检测场景或其他一些自动功能检测应用程序。 It may be faster to compute some statistics from a sample of pixels rather than the entire image and then build a machine learning classifier. 从像素样本而不是整个图像计算一些统计数据然后构建机器学习分类器可能更快。

Best of luck to you in any case! 无论如何，祝你好运！

It doesn't need CUDA if you like to stick to GLSL. 如果你想坚持使用GLSL，它不需要CUDA。 Like in the CUDA solution mentioned here, it can be done in a fragment shader staight forward. 就像在这里提到的CUDA解决方案一样，它可以在片段着色器中向前进行。 However, you need about log(resolution) draw calls. 但是，您需要关于日志（分辨率）绘制调用。 Just set up a shader that takes 2x2 pixel samples from the original image, and output the average sum of those. 只需设置一个着色器，从原始图像中获取2x2像素样本，然后输出它们的平均总和。 The result is an image with half resolution in both axes. 结果是两个轴都具有半分辨率的图像。 Repeat that until the image is 1x1 px. 重复此操作直到图像为1x1像素。 Some considerations: Use GL_FLOAT luminance textures if avaliable, to get an more precise sum. 一些注意事项：如果可用，请使用GL_FLOAT亮度纹理，以获得更精确的总和。 Use glViewport to quarter the rendering area in each stage. 使用glViewport在每个阶段对渲染区域进行四分之一。 The result then ends up in the top left pixel of your framebuffer. 然后结果会出现在帧缓冲区的左上角像素中。