简体   繁体   English

3D纹理上的OpenGL高斯内核

[英]OpenGL Gaussian Kernel on 3D texture

I would like to perform a blur on a 3D texture in openGL. 我想对openGL中的3D纹理执行模糊处理。 Since it is separable I should be able to do it in 3 passes. 由于它是可分离的,所以我应该能够在3次通过中完成它。 My question is, what would be the best way to cope with it? 我的问题是,应对它的最佳方法是什么?

I currently have the 3D texture and fill it using imageStore. 我目前拥有3D纹理,并使用imageStore进行填充。 Should I create other 2 copies of the texture for the blur or is there a way to do it while using a single texture? 我应该为纹理创建其他2个副本还是使用单一纹理的方法?

I am already using glCompute to compute the mip map of the 3D texture, but in this case I read from the texture at level 0 and write to the one at the next level so there is no conflict, while in this case I would need some copy. 我已经在使用glCompute来计算3D纹理的mip贴图,但是在这种情况下,我从0级的纹理读取并在下一级写入了mip贴图,因此没有冲突,而在这种情况下,我需要一些复制。

In short it can't be done in 3 passes, because is not a 2D image. 简而言之,由于不是2D图像,因此无法在3次扫描中完成。 Even if kernel is separable. 即使内核是可分离的。 You have to blur each image slice separately, wich is 2 passes for image (if you are using a 256x256x256 texture then you have 512 passes just for blurring along U and V coordinates). 您必须分别对每个图像切片进行模糊处理,其中图像需要进行2次遍历(如果您使用的是256x256x256纹理,则只有512次遍历是为了沿U和V坐标进行模糊处理)。 The you still have to blur along T and U (or T and V: indifferent) coordinates wich is another 512 passes. 您仍然必须沿T和U(或T和V:无所谓)坐标模糊,这是另外512次传递。 You can gain performance by using bilinear filter and read values between texels to save some constant processing cost. 通过使用双线性滤波器并读取纹理像素之间的值,可以节省性能,从而节省了一定的处理成本。 The 3D blur will be very costly. 3D模糊将非常昂贵。

Performance tip: maybe you don't need to blur the whole texture but only a part of it? 性能提示:也许您不需要模糊整个纹理,而只需模糊其中一部分? (the visible part?) (可见部分?)

The problem wich a such high number of passes , is the number of interactions between GPU and CPU: drawcalls and FBO setup wich are both slow operations that hang the CPU (probably a different API with low CPU overhead would be faster) 传递如此之多的问题是GPU与CPU之间的交互次数:绘画调用和FBO设置均是挂起CPU的缓慢操作(可能是具有较低CPU开销的其他API会更快)

Try to not separate the kernel: 尝试不分离内核:

If you have a small kernel (I guess up to 5^3, only profiling will show the max kernel size) probably the fastest way is to NOT separate the kernel (that's it, you save a lot of drawcalls and FBO binding and leverage everything to GPU fillrate and bandwith). 如果您的内核很小(我猜最多为5 ^ 3,则只有性能分析会显示最大内核大小),最快的方法可能是不分离内核(就是这样,您可以节省大量调用调用和FBO绑定并利用一切(GPU填充率和带宽)。

Spread work over time: 随时间传播工作:

Does not matter if your kernel is separated or not. 内核是否分离无关紧要。 Instead of computing a Gaussian Blur every frame, you could just compute it every second (maybe with a bigger kernel). 不必每隔一帧计算一次高斯模糊,而是每秒钟计算一次(也许使用更大的内核)。 Then you use as source of "continuos blurring data" the interpolation of the previouse blur and the next blur (wich is a 2x 3D Texture samples each frame, wich is much cheaper than continuosly blurring). 然后,将上一个模糊和下一个模糊的插值用作“连续模糊数据”的源(每个帧都是2x 3D纹理样本,比连续模糊便宜得多)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM