[英]With the -Ofast flag on gcc, does breaking down a math expression affect speed?
i want to know whether, with the -Ofast flag on gcc, the code我想知道代码是否使用 gcc 上的 -Ofast 标志
x += (a * b) + (c * d) + (e * f);
is faster/slower/the same as/than this code:比此代码更快/更慢/相同/相同:
x += a * b;
x += b * c;
x += e * f;
I have a math expression like this inside of a nested loop so any gain in speed might have a significant effect.我在嵌套循环中有一个这样的数学表达式,因此任何速度的提高都可能产生重大影响。
Intuitively, I'd expect these to compile to the same code.直观地说,我希望这些能够编译成相同的代码。 But let's see what actually happens!
但是让我们看看实际发生了什么! Using godbolt with your first version (the one-liner), we get this code:
将Godbolt与您的第一个版本(单线)一起使用,我们得到以下代码:
mov eax, DWORD PTR [rsp+20]
mov esi, DWORD PTR [rsp+28]
imul esi, DWORD PTR [rsp+32]
imul eax, DWORD PTR [rsp+24]
lea eax, [rax+rsi]
mov esi, DWORD PTR [rsp+36]
imul esi, DWORD PTR [rsp+40]
add esi, eax
add esi, DWORD PTR [rsp+44]
mov DWORD PTR [rsp+44], esi
With the second version , we get this:使用第二个版本,我们得到:
mov esi, DWORD PTR [rsp+28]
imul esi, DWORD PTR [rsp+32]
mov eax, DWORD PTR [rsp+20]
imul eax, DWORD PTR [rsp+24]
add eax, DWORD PTR [rsp+44]
lea eax, [rax+rsi]
mov esi, DWORD PTR [rsp+36]
imul esi, DWORD PTR [rsp+40]
add esi, eax
mov DWORD PTR [rsp+44], esi]
These are, I believe, the same instructions in a slightly different order.我相信,这些是相同的指令,但顺序略有不同。 I suspect the performance would be almost identical in these two cases, though perhaps (?) there would be a slight difference in pipeline performance with one versus the other.
我怀疑这两种情况下的性能几乎相同,尽管可能(?)管道性能与另一种情况略有不同。
I suspect that your first version is perfectly fine here.我怀疑您的第一个版本在这里非常好。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.