简体繁体 English

3D纹理上的OpenGL高斯内核

[英]OpenGL Gaussian Kernel on 3D texture

原文 2014-08-07 10:39:15 5 1 c++/ opengl/ 3d-texture

I would like to perform a blur on a 3D texture in openGL. 我想对openGL中的3D纹理执行模糊处理。 Since it is separable I should be able to do it in 3 passes. 由于它是可分离的，所以我应该能够在3次通过中完成它。 My question is, what would be the best way to cope with it? 我的问题是，应对它的最佳方法是什么？

I currently have the 3D texture and fill it using imageStore. 我目前拥有3D纹理，并使用imageStore进行填充。 Should I create other 2 copies of the texture for the blur or is there a way to do it while using a single texture? 我应该为纹理创建其他2个副本还是使用单一纹理的方法？

I am already using glCompute to compute the mip map of the 3D texture, but in this case I read from the texture at level 0 and write to the one at the next level so there is no conflict, while in this case I would need some copy. 我已经在使用glCompute来计算3D纹理的mip贴图，但是在这种情况下，我从0级的纹理读取并在下一级写入了mip贴图，因此没有冲突，而在这种情况下，我需要一些复制。

1 个解决方案

In short it can't be done in 3 passes, because is not a 2D image. 简而言之，由于不是2D图像，因此无法在3次扫描中完成。 Even if kernel is separable. 即使内核是可分离的。 You have to blur each image slice separately, wich is 2 passes for image (if you are using a 256x256x256 texture then you have 512 passes just for blurring along U and V coordinates). 您必须分别对每个图像切片进行模糊处理，其中图像需要进行2次遍历（如果您使用的是256x256x256纹理，则只有512次遍历是为了沿U和V坐标进行模糊处理）。 The you still have to blur along T and U (or T and V: indifferent) coordinates wich is another 512 passes. 您仍然必须沿T和U（或T和V：无所谓）坐标模糊，这是另外512次传递。 You can gain performance by using bilinear filter and read values between texels to save some constant processing cost. 通过使用双线性滤波器并读取纹理像素之间的值，可以节省性能，从而节省了一定的处理成本。 The 3D blur will be very costly. 3D模糊将非常昂贵。

Performance tip: maybe you don't need to blur the whole texture but only a part of it? 性能提示：也许您不需要模糊整个纹理，而只需模糊其中一部分？ (the visible part?) （可见部分？）

The problem wich a such high number of passes , is the number of interactions between GPU and CPU: drawcalls and FBO setup wich are both slow operations that hang the CPU (probably a different API with low CPU overhead would be faster) 传递如此之多的问题是GPU与CPU之间的交互次数：绘画调用和FBO设置均是挂起CPU的缓慢操作（可能是具有较低CPU开销的其他API会更快）

Try to not separate the kernel: 尝试不分离内核：

If you have a small kernel (I guess up to 5^3, only profiling will show the max kernel size) probably the fastest way is to NOT separate the kernel (that's it, you save a lot of drawcalls and FBO binding and leverage everything to GPU fillrate and bandwith). 如果您的内核很小（我猜最多为5 ^ 3，则只有性能分析会显示最大内核大小），最快的方法可能是不分离内核（就是这样，您可以节省大量调用调用和FBO绑定并利用一切（GPU填充率和带宽）。

Spread work over time: 随时间传播工作：

Does not matter if your kernel is separated or not. 内核是否分离无关紧要。 Instead of computing a Gaussian Blur every frame, you could just compute it every second (maybe with a bigger kernel). 不必每隔一帧计算一次高斯模糊，而是每秒钟计算一次（也许使用更大的内核）。 Then you use as source of "continuos blurring data" the interpolation of the previouse blur and the next blur (wich is a 2x 3D Texture samples each frame, wich is much cheaper than continuosly blurring). 然后，将上一个模糊和下一个模糊的插值用作“连续模糊数据”的源（每个帧都是2x 3D纹理样本，比连续模糊便宜得多）。