简体   繁体   English

__add__和+运算符之间的性能差异

[英]Performance difference between __add__ and + operator

I'm reading Learning Python 5th edition and I need some more explanation on this paragraph: 我正在阅读学习Python第5版 ,我需要对此段进行更多解释:

The __add__ method of strings, for example, is what really performs concatenation; 例如,字符串的__add__方法是真正执行串联的方法; Python maps the first of the following to the second internally, though you shouldn't usually use the second form yourself( it's less intuitive, and might even run slower ): Python将内部的第一个映射到第二个内部,但您通常不应该自己使用第二个表单(它不太直观, 甚至可能运行得更慢 ):

>>> S+'NI!'
'spamNI!'
>>> S.__add__('NI!')
'spamNI!'

so my question is, why would it run slower? 所以我的问题是,它为什么会运行得慢?

>>> def test(a, b):
...     return a + b
... 
>>> def test2(a, b):
...     return a.__add__(b)
... 
>>> import dis
>>> dis.dis(test)
  2           0 LOAD_FAST                0 (a)
              3 LOAD_FAST                1 (b)
              6 BINARY_ADD          
              7 RETURN_VALUE        
>>> dis.dis(test2)
  2           0 LOAD_FAST                0 (a)
              3 LOAD_ATTR                0 (__add__)
              6 LOAD_FAST                1 (b)
              9 CALL_FUNCTION            1
             12 RETURN_VALUE        

1 BINARY_ADD instruction instead of 2 instructions: LOAD_ATTR and CALL_FUNCTION . 1 BINARY_ADD指令而不是2条指令: LOAD_ATTRCALL_FUNCTION And since BINARY_ADD does (almost) the same thing (but in C) then we can expect it to be (slightly) faster. 而且由于BINARY_ADD (几乎)做同样的事情(但在C中),我们可以预期它会(稍微)更快。 The difference will be hardly noticable though. 然而,差异将难以察觉。

Side note: so this is similar to how assembly works. 旁注:所以这与装配工作方式类似。 Often when there is a single instruction that does the same thing as a sequence of instructions it will perform better. 通常当有一条指令与一系列指令完成相同的操作时,它会表现得更好。 For example in x64 LEA instruction can be replaced with a sequence of other instructions. 例如,在x64中, LEA指令可以用一系列其他指令替换。 But they won't perform as well. 但他们的表现并不好。

But there's a catch (which explains why I've started talking about x64 assembly). 但是有一个问题(这解释了为什么我开始谈论x64汇编)。 Sometimes a single instruction actually performs worse . 有时单个指令实际上表现更差 See the infamous LOOP instruction . 查看臭名昭着的LOOP指令 There may be many reasons for such a counterintuitive behaviour, like: a bit different assumption, not optimized implementation, historical reasons, a bug and so on, and so on. 这种违反直觉的行为可能有很多原因,例如:有点不同的假设,没有优化的实现,历史原因,bug等等。

Conclusion: in Python + theoretically should be faster than __add__ but always measure . 结论:在Python + 理论上应该比__add__更快但总是测量

It was probably explained that the + operator will actually call __add__ under the hood. 可能有人解释说+运算符实际上会在引擎盖下调用__add__ So when you do S + 'NI!' 所以当你做S + 'NI!' then what happens under the hood is that __add__ is actually called ( if S has one). 然后在__add__发生的事情是__add__实际被调用( 如果 S有一个)。 So semantically, both versions do exactly the same thing. 从语义上讲,两个版本完全相同。

The difference is in what the code corresponds to though. 不同之处在于代码对应的内容。 As you probably know, Python is compiled into bytecode which is then executed. 您可能知道,Python被编译为字节码,然后执行。 The bytecode operations are what determine what steps the interpreter has to execute. 字节码操作决定了解释器必须执行的步骤 You can take a look at the bytecode with the dis module: 您可以使用dis模块查看字节码:

>>> import dis
>>> dis.dis("S+'NI!'")
  1           0 LOAD_NAME                0 (S)
              2 LOAD_CONST               0 ('NI!')
              4 BINARY_ADD
              6 RETURN_VALUE
>>> dis.dis("S.__add__('NI!')")
  1           0 LOAD_NAME                0 (S)
              2 LOAD_METHOD              1 (__add__)
              4 LOAD_CONST               0 ('NI!')
              6 CALL_METHOD              1

As you can see, the difference here is basically that the + operator just does a BINARY_ADD while the __add__ call loads the actual method and executes it. 正如您所看到的,这里的区别基本上是+运算符只执行BINARY_ADD__add__调用加载实际方法并执行它。

When the interpreter sees the BINARY_ADD it will automatically look up the __add__ implementation and call that, but it can do so more efficiently than when you have to look up the method within Python bytecode. 当解释器看到BINARY_ADD ,它将自动查找__add__实现并调用它,但它可以比在Python字节码中查找方法时更有效。

So basically, by calling __add__ explicitly, you are preventing the interpreter from going the faster route to the implementation. 所以基本上,通过显式调用__add__ ,您将阻止解释器走向实现的更快路径。

That being said, the difference is negligible. 话虽如此,差异可以忽略不计。 If you time the difference between the two calls, you can see the difference but it is really not that much (this is 10M calls): 如果你计算两次调用之间的差异,你可以看到差异,但实际上并没有那么多(这是10M调用):

>>> timeit("S+'NI!'", setup='S = "spam"', number=10**7)
0.45791053899995404
>>> timeit("S.__add__('NI!')", setup='S = "spam"', number=10**7)
1.0082074819999889

Note that these results don't always have to look like this. 请注意,这些结果并不总是如此。 When timing a custom type (with a very simple __add__ implementation), the call to __add__ could turn out to be faster: 在计时自定义类型(使用非常简单的__add__实现)时,对__add__的调用可能会变得更快:

>>> timeit("S+'NI!'", setup='from __main__ import SType;S = SType()', number=10**7)
0.7971681049998551
>>> timeit("S.__add__('NI!')", setup='from __main__ import SType;S = SType()', number=10**7)
0.6606798959999196

The difference here is even smaller but + is slower. 这里的差异甚至更小,但+ 慢。

The bottom line is that you shouldn't worry about these differences. 最重要的是你不应该担心这些差异。 Choose what is more readable, and almost all of the time that will be + . 选择更具可读性的内容,几乎所有的时间都是+ If you need to worry about performance, then make sure to analyze your application as a whole, and don't trust such micro-benchmarks. 如果您需要担心性能问题,那么请务必分析整个应用程序,并且不要相信这些微基准测试。 They aren't helpful when looking at your application, and in 99.99%, the difference between these two ways will not make a difference. 在查看您的应用程序时它们没有帮助,在99.99%中,这两种方式之间的差异不会产生影响。 It's much more likely that there is another bottleneck in your application that will slow it down more. 更有可能的是,您的应用程序中存在另一个瓶颈会使其速度更慢。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM