简体   繁体   English

Python(2.7):为什么以下两个代码片段之间存在性能差异,这两个代码片段实现了两个字典的交集

[英]Python (2.7): Why is there a performance difference between the following 2 code snippets that implement the intersection of two dictionaries

The following 2 code snippets (A & B) both return the intersection of 2 dictionaries. 以下2个代码片段(A和B)都返回2个字典的交集。

Both of following 2 code snippets should run in O(n) and output the same results. 以下两个代码片段都应该在O(n)中运行并输出相同的结果。 However code snippet B which is pythonic, seems to run faster. 然而,pythonic的代码片段B似乎运行得更快。 These code snippets come from the Python Cookbook. 这些代码片段来自Python Cookbook。

Code Snippet A: 代码片段A:

def simpleway():
    result = []
    for k in to500.keys():
          if evens.has_key(k):
                 result.append(k)
    return result

Code Snippet B: 代码片段B:

def pythonicsimpleway():
    return [k for k in to500 if k in evens]

Some setup logic and the function used to time both functions => 一些设置逻辑和用于计时两个函数的函数=>

to500 = {}
for i in range(500): to500[i] = 1
evens = {}
for i in range(0,1000,2): evens[i] = 1

def timeo(fun, n=1000):
    def void(): pass
    start = time.clock()
    for i in range(n): void()
    stend = time.clock()
    overhead = stend - start
    start = time.clock()
    for i in range(n): fun()
    stend = time.clock()
    thetime = stend - start
    return fun.__name__, thetime - overhead

With Python 2.7.5 using a 2.3 Ghz Ivy Bridge Quad Core Processor (OS X 10.8.4) Python 2.7.5使用2.3 Ghz Ivy Bridge四核处理器(OS X 10.8.4)

I get 我明白了

>>> timeo(simpleway)
('simpleway', 0.08928500000000028)
>>> timeo(pythonicsimpleway)
('pythonicsimpleway', 0.04579400000000078)

They don't quite do the same thing; 他们并没有做同样的事情; the first one does a lot more work: 第一个做了很多工作:

  • It looks up the .has_key() and .append() methods each time in the loop, and then calls them. 它每次在循环中查找.has_key().append()方法,然后调用它们。 This requires a stack push and pop for each call. 这需要每次调用都有一个堆栈推送和弹出。
  • It appends each new element to a list one by one. 它将每个新元素逐个附加到列表中。 The Python list has to be grown dynamically to make room for these elements as you do so. 必须动态增长Python列表,以便为这些元素腾出空间。

The list comprehension collects all generated elements in a C array before creating the python list object in one operation. 列表推导在一次操作中创建python列表对象之前收集C数组中的所有生成元素。

The two functions do produce the same result, one is just needlessly slower. 这两个函数确实产生相同的结果,一个是不必要的慢。

If you want to go into the nitty gritty details, take a look at the bytecode disassembly using the dis module: 如果您想了解详细信息,请使用dis模块查看字节码反汇编:

>>> dis.dis(simpleway)
  2           0 BUILD_LIST               0
              3 STORE_FAST               0 (result)

  3           6 SETUP_LOOP              51 (to 60)
              9 LOAD_GLOBAL              0 (to500)
             12 LOAD_ATTR                1 (keys)
             15 CALL_FUNCTION            0
             18 GET_ITER            
        >>   19 FOR_ITER                37 (to 59)
             22 STORE_FAST               1 (k)

  4          25 LOAD_GLOBAL              2 (evens)
             28 LOAD_ATTR                3 (has_key)
             31 LOAD_FAST                1 (k)
             34 CALL_FUNCTION            1
             37 POP_JUMP_IF_FALSE       19

  5          40 LOAD_FAST                0 (result)
             43 LOAD_ATTR                4 (append)
             46 LOAD_FAST                1 (k)
             49 CALL_FUNCTION            1
             52 POP_TOP             
             53 JUMP_ABSOLUTE           19
             56 JUMP_ABSOLUTE           19
        >>   59 POP_BLOCK           

  6     >>   60 LOAD_FAST                0 (result)
             63 RETURN_VALUE        
>>> dis.dis(pythonicsimpleway)
  2           0 BUILD_LIST               0
              3 LOAD_GLOBAL              0 (to500)
              6 GET_ITER            
        >>    7 FOR_ITER                24 (to 34)
             10 STORE_FAST               0 (k)
             13 LOAD_FAST                0 (k)
             16 LOAD_GLOBAL              1 (evens)
             19 COMPARE_OP               6 (in)
             22 POP_JUMP_IF_FALSE        7
             25 LOAD_FAST                0 (k)
             28 LIST_APPEND              2
             31 JUMP_ABSOLUTE            7
        >>   34 RETURN_VALUE        

The number of bytecode instructions per iteration is much larger for the explicit for loop. 对于显式for循环, 每次迭代的字节码指令的数量要大得多。 The simpleway loop has to execute 11 instructions per iteration (if .has_key() is True), vs. 7 for the list comprehension, where the extra instructions mostly cover LOAD_ATTR and CALL_FUNCTION . simpleway循环必须每次迭代执行11条指令(如果.has_key()为True),而列表理解则为7条,其中额外的指令主要涵盖LOAD_ATTRCALL_FUNCTION

If you want to make the first version faster, replace .has_key() with an in test, loop directly over the dictionary and cache the .append() attribute in a local variable: 如果要更快地创建第一个版本,请使用in test替换.has_key() ,直接在字典上循环并将.append()属性缓存在局部变量中:

def simpleway_optimized():
    result = []
    append = result.append
    for k in to500:
        if k in evens:
            append(k)
    return result

Then use the timeit module to test timings properly (repeated runs, most accurate timer for your platform): 然后使用timeit模块正确测试时序(重复运行,为您的平台最精确的计时器):

>>> timeit('f()', 'from __main__ import evens, to500, simpleway as f', number=10000)
1.1673870086669922
>>> timeit('f()', 'from __main__ import evens, to500, pythonicsimpleway as f', number=10000)
0.5441269874572754
>>> timeit('f()', 'from __main__ import evens, to500, simpleway_optimized as f', number=10000)
0.6551430225372314

Here simpleway_optimized is approaching the list comprehension method in speed, but the latter still can win by building the python list object in one step. 这里simpleway_optimized正在接近列表理解方法的速度,但后者仍然可以通过一步构建python列表对象来获胜。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM