简体   繁体   English

纹理坐标和优化GLSL着色器

[英]Texture coordinates and optimizing GLSL shaders

I'm debating the pros and cons of passing texture coordinates to a GLSL shader in various ways. 我正在讨论以各种方式将纹理坐标传递给GLSL着色器的利弊。

I'm rendering a lot of instance data. 我正在渲染很多实例数据。 I have one basic model, and then I pass a Transformation Matrix and a Texture/Sprite Index to my shader. 我有一个基本模型,然后将“转换矩阵”和“纹理/精灵索引”传递给着色器。 Each model is then rotated and translated as per the transformation matrix, and the texture is decided as per this snippet: 然后根据变换矩阵旋转和平移每个模型,并根据以下代码段确定纹理:

TexCoord0 = vec2(TexCoord.x+(TexIndex%16),TexCoord.y+(TexIndex/16))/16;

The thing I don't like about this is that I've hard-coded the sprite and texture size. 我对此不满意的是,我已经对精灵和纹理大小进行了硬编码。 I could use uniforms to pass this information along, but then I still have the limitation that my sprite can't vary from instance to instance (not that I have a planned use case for this). 我可以使用制服来传递这些信息,但是我仍然有一个局限性,就是我的精灵不能因实例而异(不是我有一个计划好的用例)。 Moreover, it's a bit more computation on the GPU to determine the coordinates of the sprite. 而且,要确定子画面的坐标,需要在GPU上进行更多的计算。

Another method I could use would be to specify an entire Rect which would delimit the position, width and height of the sprite within the texture map. 我可以使用的另一种方法是指定整个Rect,该Rect可以在纹理贴图中定义精灵的位置,宽度和高度。 However, this would require specifying 4 floats (16 bytes) of information, rather than a single texture index byte. 但是,这将需要指定4个浮点数(16个字节)的信息,而不是单个纹理索引字节。 Multiply that by, say, 200K instances and we're looking at about 3 MB of data (in addition to the other data). 乘以200K实例,我们正在查看大约3 MB的数据(除了其他数据)。 I don't know if that is considered "a lot" in today's day and age or not. 我不知道这在当今时代是否被认为是“很多”。

Should I be focusing on easing the computation in my GLSL shaders or minimizing the size of my buffers? 我应该专注于简化GLSL着色器中的计算还是最小化缓冲区的大小? I hear that transferring data to the GPU is often the bottleneck, but recopyng the data to the buffer will be very seldom compared to the number of vertices it has to render every frame. 我听说将数据传输到GPU常常是瓶颈,但是与将其渲染到每一帧的顶点数相比,将数据重新复制到缓冲区很少。


Likewise, I'm considering taking out my model transform matrix and replacing it with a vec3 and vec2 for translation and rotation respectively (I only need 2 degrees of rotation) which would knock me down from 16 floats to 5, and then I can just rebuild the matrix in the vertex shader. 同样,我正在考虑取出模型转换矩阵,分别用vec3vec2进行平移和旋转(我只需要2度旋转),这会将我从16个浮点数减为5个,然后我可以在顶点着色器中重建矩阵。 Again, this takes away some flexibility, and I'm not sure of the cost savings. 再次,这失去了一些灵活性,我不确定是否可以节省成本。

I tried doing it the other way, specifying a texture rect rather than a byte index, and it actually yielded a huge speed increase (520 FPS to 3600 FPS, or 1.92ms/frame to 0.27 ms/frame). 我尝试以另一种方式进行操作,指定纹理矩形而不是字节索引,它实际上产生了巨大的速度提高(520 FPS至3600 FPS,或1.92ms /帧至0.27 ms /帧)。

It seems that reducing computation is more important, at least on my GPU (Radeon HD 5700 series). 减少计算似乎更重要,至少在我的GPU(Radeon HD 5700系列)上是如此。 Or perhaps it's just modulus that's expensive, not sure. 也许只是模数是昂贵的,不确定。 I'm quite pleased with the results though; 我对结果很满意; I get more flexibility at a cheaper cost! 我以更低的成本获得了更多的灵活性!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM