简体   繁体   中英

Why is the pow function slower than simple operations?

From a friend of mine, I heard that the pow function is slower than its equivalent in simply multiplying the base by itself, the amount of times as its exponent. For example, according to him,

#include <stdio.h>
#include <math.h>

int main () {
    double e = 2.71828
    e2 = pow (e, 2.0)
    printf("%le", e2)
}

is slower than

#include <stdio.h>

int main() {
    double e = 2.71828
    e2 = e * e
    printf("%le", e2)
}

As a novice, I would think they both compile at the same speed, and by the same logic, I would prefer the former for its typical pithiness. So, why is the former block of code slower than the latter one?

pow(double,double) needs to handle raising to any power , not just an integer based power, or especially 2 . As such, it's far more complicated than just doing a simple multiplication of two double values.

Because the pow function must implement a more generic algorithm that has to work on all the cases (in particular, it must be able to elevate to any rational exponent representable by a double ), while e*e is just a simple multiplication that will boil down to one or two assembly instructions.

Still, if the compiler is smart enough, it may automatically replace your pow(e, 2.0) with e*e automatically anyway (well, actually in your case it will probably just perform the whole computation at compile time).


Just for fun, I ran some tests: compiling the following code

#include <math.h>

double pow2(double value)
{
    return pow(value, 2.);
}

double knownpow2()
{
    double e=2.71828;
    return pow(e, 2.);
}

double valuexvalue(double value)
{
    return value*value;
}

double knownvaluexvalue()
{
    double e=2.71828;
    return e*e;
}

with g++ -O3 -c pow.c (g++ 4.7.3) and disassembling the output with objdump -d -M intel pow.o I get:

0000000000000000 <_Z4pow2d>:
   0:   f2 0f 59 c0             mulsd  xmm0,xmm0
   4:   c3                      ret    
   5:   66 66 2e 0f 1f 84 00    data32 nop WORD PTR cs:[rax+rax*1+0x0]
   c:   00 00 00 00 

0000000000000010 <_Z9knownpow2v>:
  10:   f2 0f 10 05 00 00 00    movsd  xmm0,QWORD PTR [rip+0x0]        # 18 <_Z9knownpow2v+0x8>
  17:   00 
  18:   c3                      ret    
  19:   0f 1f 80 00 00 00 00    nop    DWORD PTR [rax+0x0]

0000000000000020 <_Z11valuexvalued>:
  20:   f2 0f 59 c0             mulsd  xmm0,xmm0
  24:   c3                      ret    
  25:   66 66 2e 0f 1f 84 00    data32 nop WORD PTR cs:[rax+rax*1+0x0]
  2c:   00 00 00 00 

0000000000000030 <_Z16knownvaluexvaluev>:
  30:   f2 0f 10 05 00 00 00    movsd  xmm0,QWORD PTR [rip+0x0]        # 38 <_Z16knownvaluexvaluev+0x8>
  37:   00 
  38:   c3                      ret    

So, where the compiler already knew all the values involved it just performed the computation at compile-time; and for both pow2 and valuexvalue it emitted a single mulsd xmm0,xmm0 (ie in both cases it boils down to the multiplication of the value with itself in a single assembly instruction).

Here is one (simple, heed the comment) pow implementation . In being generic it involves a number of branches a potential division and calls to exp, log, modf ..

On the other hand, on the multiplication is a single instruction (give or take) on most higher CPUs.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM