简体   繁体   English

“x <y <z”比“x <y和y <z”快吗?

[英]Is “x < y < z” faster than “x < y and y < z”?

From this page , we know that: 这个页面 ,我们知道:

Chained comparisons are faster than using the and operator. 链式比较比使用and运算符更快。 Write x < y < z instead of x < y and y < z . x < y < z而不是x < y and y < z

However, I got a different result testing the following code snippets: 但是,我得到了不同的结果测试以下代码片段:

$ python -m timeit "x = 1.2" "y = 1.3" "z = 1.8" "x < y < z"
1000000 loops, best of 3: 0.322 usec per loop
$ python -m timeit "x = 1.2" "y = 1.3" "z = 1.8" "x < y and y < z"
1000000 loops, best of 3: 0.22 usec per loop
$ python -m timeit "x = 1.2" "y = 1.3" "z = 1.1" "x < y < z"
1000000 loops, best of 3: 0.279 usec per loop
$ python -m timeit "x = 1.2" "y = 1.3" "z = 1.1" "x < y and y < z"
1000000 loops, best of 3: 0.215 usec per loop

It seems that x < y and y < z is faster than x < y < z . 似乎x < y and y < zx < y < z快。 Why? 为什么?

After searching some posts in this site (like this one ) I know that "evaluated only once" is the key for x < y < z , however I'm still confused. 在搜索了这个网站上的一些帖子之后(就像这个一样),我知道“仅评估一次”是x < y < z的关键,但我仍然感到困惑。 To do further study, I disassembled these two functions using dis.dis : 为了进一步研究,我使用dis.dis反汇编了这两个函数:

import dis

def chained_compare():
        x = 1.2
        y = 1.3
        z = 1.1
        x < y < z

def and_compare():
        x = 1.2
        y = 1.3
        z = 1.1
        x < y and y < z

dis.dis(chained_compare)
dis.dis(and_compare)

And the output is: 输出是:

## chained_compare ##

  4           0 LOAD_CONST               1 (1.2)
              3 STORE_FAST               0 (x)

  5           6 LOAD_CONST               2 (1.3)
              9 STORE_FAST               1 (y)

  6          12 LOAD_CONST               3 (1.1)
             15 STORE_FAST               2 (z)

  7          18 LOAD_FAST                0 (x)
             21 LOAD_FAST                1 (y)
             24 DUP_TOP
             25 ROT_THREE
             26 COMPARE_OP               0 (<)
             29 JUMP_IF_FALSE_OR_POP    41
             32 LOAD_FAST                2 (z)
             35 COMPARE_OP               0 (<)
             38 JUMP_FORWARD             2 (to 43)
        >>   41 ROT_TWO
             42 POP_TOP
        >>   43 POP_TOP
             44 LOAD_CONST               0 (None)
             47 RETURN_VALUE

## and_compare ##

 10           0 LOAD_CONST               1 (1.2)
              3 STORE_FAST               0 (x)

 11           6 LOAD_CONST               2 (1.3)
              9 STORE_FAST               1 (y)

 12          12 LOAD_CONST               3 (1.1)
             15 STORE_FAST               2 (z)

 13          18 LOAD_FAST                0 (x)
             21 LOAD_FAST                1 (y)
             24 COMPARE_OP               0 (<)
             27 JUMP_IF_FALSE_OR_POP    39
             30 LOAD_FAST                1 (y)
             33 LOAD_FAST                2 (z)
             36 COMPARE_OP               0 (<)
        >>   39 POP_TOP
             40 LOAD_CONST               0 (None)

It seems that the x < y and y < z has less dissembled commands than x < y < z . 似乎x < y and y < z具有比x < y < z更少的拆卸命令。 Should I consider x < y and y < z faster than x < y < z ? 我应该考虑x < y and y < zx < y < z快吗?

Tested with Python 2.7.6 on an Intel(R) Xeon(R) CPU E5640 @ 2.67GHz. 在Intel(R)Xeon(R)CPU E5640 @ 2.67GHz上使用Python 2.7.6进行测试。

The difference is that in x < y < z y is only evaluated once. 区别在于x < y < z y仅评估一次。 This does not make a large difference if y is a variable, but it does when it is a function call, which takes some time to compute. 如果y是一个变量,这不会产生很大的差异,但是当它是一个函数调用时会这样做,这需要一些时间来计算。

from time import sleep
def y():
    sleep(.2)
    return 1.3
%timeit 1.2 < y() < 1.8
10 loops, best of 3: 203 ms per loop
%timeit 1.2 < y() and y() < 1.8
1 loops, best of 3: 405 ms per loop

Optimal bytecode for both of the functions you defined would be 您定义的两个函数的最佳字节码

          0 LOAD_CONST               0 (None)
          3 RETURN_VALUE

because the result of the comparison is not used. 因为没有使用比较的结果。 Let's make the situation more interesting by returning the result of the comparison. 让我们通过返回比较结果使情况更有趣。 Let's also have the result not be knowable at compile time. 让我们在编译时也不知道结果。

def interesting_compare(y):
    x = 1.1
    z = 1.3
    return x < y < z  # or: x < y and y < z

Again, the two versions of the comparison are semantically identical, so the optimal bytecode is the same for both constructs. 同样,两个版本的比较在语义上是相同的,因此两个构造的最佳字节码是相同的。 As best I can work it out, it would look like this. 最好我可以解决它,它看起来像这样。 I've annotated each line with the stack contents before and after each opcode, in Forth notation (top of stack at right, -- divides before and after, trailing ? indicates something that might or might not be there). 我用每个操作码之前和之后的堆栈内容注释每一行,用Forth表示法(右边的堆栈顶部, --前后划分,尾随?表示可能存在或可能不存在的内容)。 Note that RETURN_VALUE discards everything that happens to be left on the stack underneath the value returned. 请注意, RETURN_VALUE会丢弃在返回值下面的堆栈上发生的所有事情。

          0 LOAD_FAST                0 (y)    ;          -- y
          3 DUP_TOP                           ; y        -- y y
          4 LOAD_CONST               0 (1.1)  ; y y      -- y y 1.1
          7 COMPARE_OP               4 (>)    ; y y 1.1  -- y pred
         10 JUMP_IF_FALSE_OR_POP     19       ; y pred   -- y
         13 LOAD_CONST               1 (1.3)  ; y        -- y 1.3
         16 COMPARE_OP               0 (<)    ; y 1.3    -- pred
     >>  19 RETURN_VALUE                      ; y? pred  --

If an implementation of the language, CPython, PyPy, whatever, does not generate this bytecode (or its own equivalent sequence of operations) for both variations, that demonstrates the poor quality of that bytecode compiler . 如果语言的实现,CPython,PyPy,无论如何,都不会为这两种变体生成这个字节码(或它自己的等效操作序列), 这表明该字节码编译器的质量很差 Getting from the bytecode sequences you posted to the above is a solved problem (I think all you need for this case is constant folding , dead code elimination , and better modeling of the contents of the stack; common subexpression elimination would also be cheap and valuable), and there's really no excuse for not doing it in a modern language implementation. 从你发布到上面的字节码序列获取是一个解决的问题(我认为你需要的是这种情况下的常量折叠死代码消除 ,以及更好的堆栈内容建模; 常见的子表达式消除也很便宜且有价值),并没有理由不在现代语言实现中这样做。

Now, it happens that all current implementations of the language have poor-quality bytecode compilers. 现在,恰好该语言的所有当前实现都具有质量差的字节码编译器。 But you should ignore that while coding! 但你应该在编码时忽略它! Pretend the bytecode compiler is good, and write the most readable code. 假装字节码编译器是好的,并编写最可读的代码。 It will probably be plenty fast enough anyway. 无论如何,它可能足够快。 If it isn't, look for algorithmic improvements first, and give Cython a try second -- that will provide far more improvement for the same effort than any expression-level tweaks you might apply. 如果不是,请首先寻找算法改进,然后尝试Cython - 这将为您提供比您可能应用的任何表达式调整更多的改进。

Since the difference in the output seem to be due to lack of optimization I think you should ignore that difference for most cases - it could be that the difference will go away. 由于输出的差异似乎是由于缺乏优化,我认为在大多数情况下你应该忽略这种差异 - 可能差异会消失。 The difference is because y only should be evaluated once and that is solved by duplicating it on the stack which requires an extra POP_TOP - the solution to use LOAD_FAST might be possible though. 区别在于因为y只应该被评估一次,并且通过在堆栈上复制它来解决,这需要额外的POP_TOP - 尽管可能使用LOAD_FAST的解决方案。

The important difference though is that in x<y and y<z the second y should be evaluated twice if x<y evaluates to true, this has implications if the evaluation of y takes considerable time or have side effects. 但重要的区别是,在x<y and y<z ,如果x<y计算结果为真,则第二个y应该被评估两次,如果对y的评估需要相当长的时间或有副作用,则会产生影响。

In most scenarios you should use x<y<z despite the fact it's somewhat slower. 在大多数情况下,你应该使用x<y<z尽管它有点慢。

First of all, your comparison is pretty much meaningless because the two different constructs were not introduced to provide a performance improvement, so you shouldn't decide whether to use one in place of the other based on that. 首先,你的比较几乎毫无意义,因为没有引入两种不同的结构来提供性能改进,所以你不应该根据它来决定是否使用一种结构代替另一种。

The x < y < z construct: x < y < z构造:

  1. Is clearer and more direct in its meaning. 它的含义更清晰,更直接。
  2. Its semantics is what you'd expect from the "mathematical meaning" of the comparison: evalute x , y and z once and check if the whole condition holds. 它的语义是你对比较的“数学意义”所期望的:evalute xyz 一次,并检查整个条件是否成立。 Using and changes the semantics by evaluating y multiple times, which can change the result . 通过多次计算y使用and更改语义,这可以改变结果

So choose one in place of the other depending on the semantics you want and, if they are equivalent, whether one is more readable than the other. 因此,根据您想要的语义选择一个代替另一个, 如果它们是等价的,那么一个是否比另一个更可读。

This said: more disassembled code does does not imply slower code. 这表示:更多的反汇编代码也并不意味着慢的代码。 However executing more bytecode operations means that each operation is simpler and yet it requires an iteration of the main loop. 但是,执行更多的字节码操作意味着每个操作都更简单,但它需要主循环的迭代。 This means that if the operations you are performing are extremely fast (eg local variable lookup as you are doing there), then the overhead of executing more bytecode operations can matter. 这意味着如果您正在执行的操作非常快(例如,当您在那里执行局部变量查找)时,执行更多字节码操作的开销可能很重要。

But note that this result does not hold in the more generic situation, only to the "worst case" that you happen to profile. 但是请注意,这个结果不会在更一般的情况下举行,只到你碰巧配置文件中的“最坏情况”。 As others have noted, if you change y to something that takes even a bit more time you'll see that the results change, because the chained notation evaluates it only once. 正如其他人所指出的那样,如果你将y改为需要更多时间的东西,你会看到结果发生变化,因为链式符号只会评估一次。

Summarizing: 总结:

  • Consider semantics before performance. 在性能之前考虑语义。
  • Take into account readability. 考虑可读性。
  • Don't trust micro benchmarks. 不要相信微基准。 Always profile with different kind of parameters to see how a function/expression timing behave in relation to said parameters and consider how you plan to use it. 始终使用不同类型的参数进行分析,以查看函数/表达式时序与所述参数的关系,并考虑您计划如何使用它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM