简体   繁体   English

为什么加法和乘法比比较更快?

[英]Why are addition and multiplication faster than comparisons?

I always thought comparisons were the fastest operation a computer could execute. 我一直认为比较是计算机可以执行的最快的操作。 I remember hearing it on a presentation from D. Knuth where he'd write loops in descending order "because comparison against 0 is fast". 我记得在D. Knuth的演讲中听到它,他在那里按降序编写循环“因为与0的比较很快”。 I also read that multiplications should be slower than additions here . 我还读到乘法应该比这里的加法慢。

I'm surprised to see that, in both Python 2 and 3, testing under both Linux and Mac, comparisons seem to be much slower than arithmetic operations. 我很惊讶地看到,在Python 2和Mac下,在Linux和Mac下都进行了测试,比较似乎比算术运算慢得多。

Could anyone explain why? 谁有人解释为什么?

%timeit 2 > 0
10000000 loops, best of 3: 41.5 ns per loop

%timeit 2 * 2
10000000 loops, best of 3: 27 ns per loop

%timeit 2 * 0
10000000 loops, best of 3: 27.7 ns per loop

%timeit True != False
10000000 loops, best of 3: 75 ns per loop

%timeit True and False
10000000 loops, best of 3: 58.8 ns per loop

And under python 3: 并在python 3下:

$ ipython3
Python 3.5.2 | packaged by conda-forge | (default, Sep  8 2016, 14:36:38) 
Type "copyright", "credits" or "license" for more information.

IPython 5.1.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: %timeit 2 + 2
10000000 loops, best of 3: 22.9 ns per loop

In [2]: %timeit 2 * 2
10000000 loops, best of 3: 23.7 ns per loop

In [3]: %timeit 2 > 2
10000000 loops, best of 3: 45.5 ns per loop

In [4]: %timeit True and False
10000000 loops, best of 3: 62.8 ns per loop

In [5]: %timeit True != False
10000000 loops, best of 3: 92.9 ns per loop

This is happening due to Constant Folding in the Peep Hole optimizer within Python compiler. 这是由于Python编译器中的Peep Hole 优化器中的常量折叠而发生的。

Using the dis module, if we break each of the statements to look inside how they are being translated at machine level, you will observe that all operators like inequality, equality etc are first loaded into memory and then evaluated. 使用dis模块,如果我们打破每个语句以查看它们在机器级别的翻译方式,你会发现所有操作符如不等式,相等等首先被加载到内存中然后进行评估。 However, all expressions like multiplication, addition etc are calculated and loaded as a constant into memory. 但是,计算所有表达式,如乘法,加法等,并将其作为常量加载到内存中。

Overall, this leads to a lesser number of execution steps, making the steps faster: 总的来说,这会导致执行步骤的数量减少,从而加快了步骤:

>>> import dis

>>> def m1(): True != False
>>> dis.dis(m1)
  1           0 LOAD_GLOBAL              0 (True)
              3 LOAD_GLOBAL              1 (False)
              6 COMPARE_OP               3 (!=)
              9 POP_TOP             
             10 LOAD_CONST               0 (None)
             13 RETURN_VALUE        

>>> def m2(): 2 *2
>>> dis.dis(m2)
  1           0 LOAD_CONST               2 (4)
              3 POP_TOP             
              4 LOAD_CONST               0 (None)
              7 RETURN_VALUE        

>>> def m3(): 2*5
>>> dis.dis(m3)
  1           0 LOAD_CONST               3 (10)
              3 POP_TOP             
              4 LOAD_CONST               0 (None)
              7 RETURN_VALUE        

>>> def m4(): 2 > 0
>>> dis.dis(m4)
  1           0 LOAD_CONST               1 (2)
              3 LOAD_CONST               2 (0)
              6 COMPARE_OP               4 (>)
              9 POP_TOP             
             10 LOAD_CONST               0 (None)
             13 RETURN_VALUE        

>>> def m5(): True and False
>>> dis.dis(m5)
  1           0 LOAD_GLOBAL              0 (True)
              3 JUMP_IF_FALSE_OR_POP     9
              6 LOAD_GLOBAL              1 (False)
        >>    9 POP_TOP             
             10 LOAD_CONST               0 (None)
             13 RETURN_VALUE        

As others have explained, this is because Python's peephole optimiser optimises arithmetic operations but not comparisons. 正如其他人所解释的那样,这是因为Python的窥视孔优化器优化了算术运算而不是比较。

Having written my own peephole optimiser for a Basic compiler, I can assure you that optimising constant comparisons is just as easy as optimising constant arithmetic operations. 为Basic编译器编写了自己的窥孔优化器之后,我可以向您保证,优化常量比较与优化常量算术运算一样简单。 So there is no technical reason why Python should do the latter but not the former. 所以没有技术上的理由说明为什么Python应该做后者而不是前者。

However, each such optimisation has to be separately programmed, and comes with two costs: the time to program it, and the extra optimising code taking up space in the Python executable. 但是,每个这样的优化都必须单独编程,并且需要两个成本:编程它的时间,以及占用Python可执行文件空间的额外优化代码。 So you find yourself having to do some triage: which of these optimisations is common enough to make it worth the costs? 因此,您发现自己必须进行一些分类:哪些优化足以使其值得花费?

It seems that the Python implementers, reasonably enough, decided to optimise the arithmetic operations first. 似乎Python的实现者,合理地决定首先优化算术运算。 Perhaps they will get round to comparisons in a future release. 也许他们将在未来的版本中进行比较。

A quick disassambling reveals that the comparison involves more operations. 一个快速的解雇表明,比较涉及更多的操作。 According to this answer , there is some precalculation done by the "peephole optimiser" ( wiki ) for multiplication, addition, etc., but not for the comparison operators: 根据这个答案“窥视孔优化器”wiki )对乘法,加法等进行了一些预先计算,但对于比较运算符则没有:

>>> import dis
>>> def a():
...   return 2*3
... 
>>> dis.dis(a)
  2           0 LOAD_CONST               3 (6)
              3 RETURN_VALUE
>>> def b():
...   return 2 < 3
... 
>>> dis.dis(b)
  2           0 LOAD_CONST               1 (2)
              3 LOAD_CONST               2 (3)
              6 COMPARE_OP               0 (<)
              9 RETURN_VALUE

Like others have commented - it is due to the peep hole optimizer which pre-computes the results of 2*3 (6). 像其他人一样评论 - 这是由于窥视孔优化器预先计算2 * 3(6)的结果。 As the dis shows 如显示

0 LOAD_CONST               3 (6)

But try this - it will disable the optimizer from pre-computing the results 但试试这个 - 它会禁用优化器预先计算结果

>>> def a(a, b):
...     return a*b
...
>>> dis.dis(a)
  2           0 LOAD_FAST                0 (a)
              3 LOAD_FAST                1 (b)
              6 BINARY_MULTIPLY
              7 RETURN_VALUE
>>> def c(a,b):
...     return a<b
...
>>> dis.dis(c)
  2           0 LOAD_FAST                0 (a)
              3 LOAD_FAST                1 (b)
              6 COMPARE_OP               0 (<)
              9 RETURN_VALUE
>>>

If you time these functions the compare will be faster. 如果你计时这些功能,比较会更快。

For python case the above answers are correct. 对于python案例,上述答案是正确的。 For machine code things a bit more complicated. 对于机器代码来说,事情要复杂一些。 I assume we are talking about integer operations here, with floats and complex objects none of the below will apply. 我假设我们在这里讨论整数运算,浮点数和复杂对象都不适用。 Also, we assume that the values you are comparing are already loaded into registers. 此外,我们假设您正在比较的值已经加载到寄存器中。 If they are not fetching them from wherever they are could take 100 of times longer than the actual operations 如果他们没有从任何地方取出它们可能比实际操作长100倍

Modern CPUs have several ways to compare two numbers. 现代CPU有几种方法来比较两个数字。 Very popular ones are doing XOR a,b if you just want to see if two values are equal or CMP a,b if you want to know the relationship between the values ( less, greater, equal, etc ). 非常受欢迎的是做XOR a,b如果你只想看两个值是否相等或CMP a,b如果你想知道值之间的关系(更少,更大,相等等)。 CMP operation is just a subtraction with the result thrown away because we are only interested in post-op flags. CMP操作只是一个减法,结果被抛弃,因为我们只对post-op标志感兴趣。

Both of these operations are of depth 1, so they could be executed in a single CPU cycle. 这两个操作都是深度1,因此可以在单个CPU周期中执行。 This is as fast as you can go. 这是你能尽可能快的。 Multiplication is a form of repeated additions so the depth of the operation is usually equal to the size of your register. 乘法是重复加法的一种形式,因此操作的深度通常等于寄存器的大小。 There are some optimizations that could be made to reduce the depth, but generally multiplication is one of the slower operations that CPU can perform. 可以通过一些优化来减少深度,但通常乘法是CPU可以执行的较慢操作之一。

However, multiplying by 0,1 or any power of 2 could be reduced to shift operation. 但是,乘以0,1或任何2的幂可以减少到移位操作。 Which is also depth one operation. 这也是深度一次操作。 So it takes the same time as comparing two numbers. 因此比较两个数字需要相同的时间。 Think about decimal system, you can multiply any number by 10, 100, 1000 by appending zeros at the end of the number. 考虑十进制系统,您可以通过在数字末尾附加零来将任意数字乘以10,100,1000。 Any optimizing compiler will recognize this type of multiplication and use the most efficient operation for it. 任何优化编译器都会识别这种类型的乘法,并使用最有效的运算。 Modern CPUs are also pretty advanced, so they can perform same optimization in the hardware by counting how many bits are set in any of the operands. 现代CPU也非常先进,因此它们可以通过计算任何操作数中设置的位数来在硬件中执行相同的优化。 And if it's just one bit the operation will be reduced to the shift. 如果只是一位,操作将减少到班次。

So in your case multiplying by 2 is as fast as comparing two numbers. 因此,在您的情况下,乘以2与比较两个数字一样快。 As people above pointed out any optimizing compiler will see that you are multiplying two constants, so it will replace just replace the function with returning a constant. 正如上面的人指出的那样,任何优化编译器都会看到你将两个常量相乘,所以它将替换只需用返回常量替换函数。

Wow, The answer by @mu 無 blew my mind! 哇,@ mu的回答引起了我的注意! However, it is important not to generalize when deriving your conclusions... You are checking the times for CONSTANTS not variables . 但是,在得出结论时不要一概而论是重要的......你正在检查CONSTANTS的时间而不是变量 For variables, multiplication seems to be slower than comparison. 对于变量,乘法似乎比比较慢。

Here is the more interesting case, in which the numbers to be compared are stored in actual variables... 这是一个更有趣的案例,其中要比较的数字存储在实际变量中......

import timeit
def go():
    number=1000000000
    print
    print 'a>b, internal:',timeit.timeit(setup="a=1;b=1", stmt="a>b", number=number)
    print 'a*b, internal:',timeit.timeit(setup="a=1;b=1", stmt="a*b", number=number)
    print 'a>b, shell   :',
    %%timeit -n 1000000000 "a=1;b=1" "a>b"
    print 'a*b, shell   :',
    %%timeit -n 1000000000 "a=1;b=1" "a*b"
go()

The result gives: 结果给出:

a>b, internal: 51.9467676445
a*b, internal: 63.870462403
a>b, shell   :1000000000 loops, best of 3: 19.8 ns per loop
a>b, shell   :1000000000 loops, best of 3: 19.9 ns per loop

And order is restored in the universe ;) 并且在宇宙中恢复秩序;)

For completeness, lets see some more cases... What about if we have one variable and one constant? 为了完整性,让我们看一些更多的案例......如果我们有一个变量和一个常量呢?

import timeit
def go():
    print 'a>2, shell   :',
    %%timeit -n 10000000 "a=42" "a>2"
    print 'a*2, shell   :',
    %%timeit -n 10000000 "a=42" "a*2"
go()

a>2, shell   :10000000 loops, best of 3: 18.3 ns per loop
a*2, shell   :10000000 loops, best of 3: 19.3 ns per loop

what happens with bools? bools会发生什么?

import timeit
def go():
    print 
    number=1000000000
    print 'a==b    : ', timeit.timeit(setup="a=True;b=False",stmt="a==b",number=number) 
    print 'a and b : ', timeit.timeit(setup="a=True;b=False",stmt="a and b",number=number) 
    print 'boolean ==, shell   :',
    %%timeit -n 1000000000 "a=True;b=False" "a == b"
    print 'boolean and, shell   :',
    %%timeit -n 1000000000 "a=False;b=False" "a and b"
go()

a==b    :  70.8013108982
a and b :  38.0614485665
boolean ==, shell   :1000000000 loops, best of 3: 17.7 ns per loop
boolean and, shell   :1000000000 loops, best of 3: 16.4 ns per loop

:D Now this is interesting , it seems boolean and is faster than ==. :D 现在这很有趣 ,它似乎是布尔值并且比==更快。 However all this would be ok as the Donald Knuth would not loose his sleep, the best way to compare would be to use and... 然而所有这一切都可以,因为Donald Knuth不会失眠,最好的比较方法是使用和......

In practice, we should check numpy, which may be even more significant... 在实践中,我们应该检查numpy,这可能更重要......

import timeit
def go():
    number=1000000 # change if you are in a hurry/ want to be more certain....
    print '====   int   ===='
    print 'a>b  : ', timeit.timeit(setup="a=1;b=2",stmt="a>b",number=number*100) 
    print 'a*b  : ', timeit.timeit(setup="a=1;b=2",stmt="a*b",number=number*100) 
    setup = "import numpy as np;a=np.arange(0,100);b=np.arange(100,0,-1);"
    print 'np: a>b  : ', timeit.timeit(setup=setup,stmt="a>b",number=number) 
    print 'np: a*b  : ', timeit.timeit(setup=setup,stmt="a*b",number=number) 
    print '====   float ===='
    print 'float a>b  : ', timeit.timeit(setup="a=1.1;b=2.3",stmt="a>b",number=number*100) 
    print 'float a*b  : ', timeit.timeit(setup="a=1.1;b=2.3",stmt="a*b",number=number*100) 
    setup = "import numpy as np;a=np.arange(0,100,dtype=float);b=np.arange(100,0,-1,dtype=float);"
    print 'np float a>b  : ', timeit.timeit(setup=setup,stmt="a>b",number=number) 
    print 'np float a*b  : ', timeit.timeit(setup=setup,stmt="a*b",number=number) 
    print '====   bool ===='
    print 'a==b    : ', timeit.timeit(setup="a=True;b=False",stmt="a==b",number=number*1000) 
    print 'a and b : ', timeit.timeit(setup="a=True;b=False",stmt="a and b",number=number*1000) 
    setup = "import numpy as np;a=np.arange(0,100)>50;b=np.arange(100,0,-1)>50;"
    print 'np a == b  : ', timeit.timeit(setup=setup,stmt="a == b",number=number) 
    print 'np a and b : ', timeit.timeit(setup=setup,stmt="np.logical_and(a,b)",number=number) 
    print 'np a == True  : ', timeit.timeit(setup=setup,stmt="a == True",number=number) 
    print 'np a and True : ', timeit.timeit(setup=setup,stmt="np.logical_and(a,True)",number=number) 
go()

====   int   ====
a>b  :  4.5121130192
a*b  :  5.62955748632
np: a>b  :  0.763992986986
np: a*b  :  0.723006032235
====   float ====
float a>b  :  6.39567713272
float a*b  :  5.62149055215
np float a>b  :  0.697037433398
np float a*b  :  0.847941712765
====   bool ====
a==b  :  6.91458288689
a and b  :  3.6289697892
np a == b  :  0.789666454087
np a and b :  0.724517620007
np a == True  :  1.55066706189
np a and True :  1.44293071804

Again, same behavior... So I guess, one can benefit by using instead for == in general, 同样,相同的行为...所以我想,一般来说,通过改为使用==

at least in Python 2 (Python 2.7.11 |Anaconda 2.4.1 (64-bit)| (default, Feb 16 2016, 09:58:36) [MSC v.1500 64 bit (AMD64)]), where I tried all these... 至少在Python 2中(Python 2.7.11 | Anaconda 2.4.1(64位)|(默认,2016年2月16日,09:58:36)[MSC v.1500 64 bit(AMD64)]),我试过所有这些...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM