简体   繁体   English

点积与直接矢量分量在着色器中的总和性能

[英]Dot product vs Direct vector components sum performance in shaders

I'm writing CG shaders for advanced lighting calculation for game based on Unity.我正在为基于 Unity 的游戏编写 CG 着色器,用于高级照明计算。 Sometimes it is needed to sum all vector components.有时需要对所有向量分量求和。 There are two ways to do it:有两种方法可以做到:

  1. Just write something like: float sum = vx + vy + vz;只需写一些类似的东西:float sum = vx + vy + vz;
  2. Or do something like: float sum = dot(v,float3(1,1,1));或者做类似的事情: float sum = dot(v,float3(1,1,1));

I am really curious about what is faster and looks better for code style.我真的很好奇什么更快,代码风格看起来更好。

It's obvious that if we have same question for CPU calculations, the first simle way is much better.很明显,如果我们对 CPU 计算有同样的问题,第一种简单的方法要好得多。 Because of:因为:

a) There is no need to allocate another float(1,1,1) vector a) 不需要分配另一个 float(1,1,1) 向量

b) There is no need to multiply every original vector "v" components by 1. b) 不需要将每个原始向量“v”分量乘以 1。

But since we do it in shader code, which runs on GPU, I belive there is some great hardware optimization for dot product function, and may be allocation of float3(1,1,1) will be translated in no allocation at all.但是由于我们是在 GPU 上运行的着色器代码中进行的,我相信对于点积函数有一些很好的硬件优化,并且可能是 float3(1,1,1) 的分配将被转换为根本没有分配。

float4 _someVector;

void surf (Input IN, inout SurfaceOutputStandard o){
   float sum = _someVector.x + _someVector.y + _someVector.z + _someVector.w;
    // VS
   float sum2 = dot(_someVector, float4(1,1,1,1));
}

Implementation of the Dot product in cg: https://developer.download.nvidia.com/cg/dot.html cg中Dot产品的实现: https : //developer.download.nvidia.com/cg/dot.html

IMHO difference is immeasurable, in 98% of the cases, but first one should be faster, because multiplication is a "more expensive" operation恕我直言,差异是无法估量的,在 98% 的情况下,但第一个应该更快,因为乘法是一种“更昂贵”的操作

Check this link .检查此链接

Vec3 Dot has a cost of 3 cycles, while Scalar Add has a cost of 1. Thus, in almost all platforms (AMD and NVIDIA): Vec3 Dot 的成本为 3 个周期,而 Scalar Add 的成本为 1。因此,在几乎所有平台(AMD 和 NVIDIA)中:

float sum = vx + vy + vz; has a cost of 2 float sum = dot(v,float3(1,1,1));成本为 2 float sum = dot(v,float3(1,1,1)); has a cost of 3成本为 3

The first implementation should be faster.第一个实现应该更快。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM