简体   繁体   English

当指数为 integer 时,pow(x,p) 是否更快?

[英]Is pow(x,p) faster when the exponent is an integer?

In a code using pow(double x, double p) (a big part of the cases have p = 2.0 ) I observed than the execution of my code is clearly faster when p = 2.0 than when p = 2.000000001 .在使用pow(double x, double p)的代码中(大部分情况有p = 2.0 )我观察到,当p = 2.0时,我的代码的执行速度明显快于p = 2.000000001时。 I conclude that, on my compiler (gcc 4.8.5), the implementation of pow detects when it's a square at runtime.我得出的结论是,在我的编译器(gcc 4.8.5)上, pow的实现会在运行时检测到它何时是正方形。

Following this observation, I conclude that I don't need a specific implementation when I know that p is 2. But my code must be cross-platform, then my question:根据这个观察,我得出结论,当我知道 p 为 2 时,我不需要特定的实现。但是我的代码必须是跨平台的,那么我的问题是:

Is pow optimized when the exponent is an integer in most of the c++03 compilers?在大多数 c++03 编译器中,当指数为 integer 时,是否优化了pow

In my current context, "most of the compiler" = "gcc >= 4.8, intel with msvc, intel on unix"在我当前的上下文中,“大多数编译器”=“gcc >= 4.8,intel with msvc,intel on unix”

Yes the standard libraries do attempt to do runtime optimization if the exponent is detected to be a natural number.是的,如果检测到指数是自然数,标准库会尝试进行运行时优化。 Looking at the current version glibc i386 version of POW you can find the following code.查看当前版本 glibc i386 版本的 POW 可以找到以下代码。

    /* First see whether `y' is a natural number.  In this case we
       can use a more precise algorithm.  */
    fld %st     // y : y : x
    fistpll (%esp)      // y : x
    fildll  (%esp)      // int(y) : y : x
    fucomp  %st(1)      // y : x
    fnstsw
    sahf
    jne 3f

embedded in the implementation.嵌入到实现中。 The full code can be found at github .完整代码可以在github找到。

Note that for other versions of glibc and other architectures the answer may differ.请注意,对于其他版本的 glibc 和其他架构,答案可能会有所不同。

EDIT编辑

The answer below mininterprets the OP's question which was specifically about RUNTIME optimisation whereas I investigated compile time optimisation.下面的答案解释了 OP 的问题,该问题专门关于RUNTIME优化,而我研究了编译时优化。

Original Answer原始答案

Adding to my comment.添加到我的评论。 As long at the exponent is a contant int less than or equal to MAXINT then you get.只要在指数处是一个小于或等于 MAXINT 的常量int ,那么你就得到了。

#include <cmath>

double pow(double a)
{
    return std::pow(a, (int)2147483647);
}

generates生成

pow(double):
        movapd  xmm4, xmm0
        mulsd   xmm4, xmm0
        movapd  xmm5, xmm4
        mulsd   xmm5, xmm4
        mulsd   xmm4, xmm0
        movapd  xmm6, xmm5
        mulsd   xmm4, xmm5
        mulsd   xmm6, xmm5
        movapd  xmm3, xmm6
        mulsd   xmm3, xmm6
        mulsd   xmm3, xmm0
        movapd  xmm0, xmm4
        movapd  xmm2, xmm3
        movapd  xmm1, xmm3
        mulsd   xmm2, xmm6
        mulsd   xmm1, xmm3
        mulsd   xmm2, xmm1
        mulsd   xmm1, xmm1
        mulsd   xmm1, xmm2
        mulsd   xmm1, xmm1
        mulsd   xmm1, xmm1
        mulsd   xmm1, xmm1
        mulsd   xmm1, xmm4
        mulsd   xmm1, xmm1
        mulsd   xmm1, xmm1
        mulsd   xmm1, xmm1
        mulsd   xmm1, xmm4
        mulsd   xmm1, xmm1
        mulsd   xmm1, xmm1
        mulsd   xmm1, xmm1
        mulsd   xmm1, xmm4
        mulsd   xmm1, xmm1
        mulsd   xmm1, xmm1
        mulsd   xmm1, xmm1
        mulsd   xmm1, xmm4
        mulsd   xmm1, xmm1
        mulsd   xmm1, xmm1
        mulsd   xmm1, xmm1
        mulsd   xmm1, xmm4
        mulsd   xmm1, xmm1
        mulsd   xmm1, xmm1
        mulsd   xmm1, xmm1
        mulsd   xmm1, xmm4
        mulsd   xmm1, xmm1
        mulsd   xmm1, xmm1
        mulsd   xmm1, xmm1
        mulsd   xmm1, xmm4
        mulsd   xmm1, xmm1
        mulsd   xmm1, xmm1
        mulsd   xmm1, xmm1
        mulsd   xmm0, xmm1
        ret

but you have to be careful to use an int literal但你必须小心使用int字面量

#include <cmath>

double pow(double a)
{
    return std::pow(a, (unsigned int) 2147483647);
}

generates生成

pow(double):
        movsd   xmm1, QWORD PTR .LC0[rip]
        jmp     pow
.LC0:
        .long   4290772992
        .long   1105199103

EDIT编辑

I seem to be wrong.我好像错了。 The above was tested with an early version of GCC.以上是用早期版本的 GCC 测试的。 In early versions of GCC and CLANG the multiplication is inlined.在 GCC 和 CLANG 的早期版本中,乘法是内联的。 However in later versions this does not happen.但是在以后的版本中,这不会发生。 It is possible that newer versions of If you switch the versions on godbolt then you see that the above DOES NOT OCCUR.如果您在 Godbolt 上切换版本,则可能会出现较新版本,然后您会看到上述情况不会发生。

For example例如

#include <cmath>

double pow_v2(double a)
{
    return std::pow(a, 2);
}

double pow_v3(double a)
{
    return std::pow(a, 3);
}

for CLANG 10.0 generates对于 CLANG 10.0 生成

pow_v2(double):                             # @pow_v2(double)
        mulsd   xmm0, xmm0
        ret
.LCPI1_0:
        .quad   4613937818241073152     # double 3
pow_v3(double):                             # @pow_v3(double)
        movsd   xmm1, qword ptr [rip + .LCPI1_0] # xmm1 = mem[0],zero
        jmp     pow                     # TAILCALL

but for CLANG 5.0 it generates但是对于 CLANG 5.0 它会生成

pow_v2(double):                             # @pow_v2(double)
        mulsd   xmm0, xmm0
        ret
pow_v3(double):                             # @pow_v3(double)
        movapd  xmm1, xmm0
        mulsd   xmm1, xmm1
        mulsd   xmm1, xmm0
        movapd  xmm0, xmm1
        ret

It seems that for later versions of the compilers the intrinsic pow function is faster to call than inlining the multiplications so the compilers change their strategy.似乎对于更高版本的编译器,内在 pow function 调用比内联乘法更快,因此编译器改变了他们的策略。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM