[英]Python (2.7): Why is there a performance difference between the following 2 code snippets that implement the intersection of two dictionaries
The following 2 code snippets (A & B) both return the intersection of 2 dictionaries. 以下2个代码片段(A和B)都返回2个字典的交集。
Both of following 2 code snippets should run in O(n) and output the same results. 以下两个代码片段都应该在O(n)中运行并输出相同的结果。 However code snippet B which is pythonic, seems to run faster.
然而,pythonic的代码片段B似乎运行得更快。 These code snippets come from the Python Cookbook.
这些代码片段来自Python Cookbook。
Code Snippet A: 代码片段A:
def simpleway():
result = []
for k in to500.keys():
if evens.has_key(k):
result.append(k)
return result
Code Snippet B: 代码片段B:
def pythonicsimpleway():
return [k for k in to500 if k in evens]
Some setup logic and the function used to time both functions => 一些设置逻辑和用于计时两个函数的函数=>
to500 = {}
for i in range(500): to500[i] = 1
evens = {}
for i in range(0,1000,2): evens[i] = 1
def timeo(fun, n=1000):
def void(): pass
start = time.clock()
for i in range(n): void()
stend = time.clock()
overhead = stend - start
start = time.clock()
for i in range(n): fun()
stend = time.clock()
thetime = stend - start
return fun.__name__, thetime - overhead
With Python 2.7.5 using a 2.3 Ghz Ivy Bridge Quad Core Processor (OS X 10.8.4) Python 2.7.5使用2.3 Ghz Ivy Bridge四核处理器(OS X 10.8.4)
I get 我明白了
>>> timeo(simpleway)
('simpleway', 0.08928500000000028)
>>> timeo(pythonicsimpleway)
('pythonicsimpleway', 0.04579400000000078)
They don't quite do the same thing; 他们并没有做同样的事情; the first one does a lot more work:
第一个做了很多工作:
.has_key()
and .append()
methods each time in the loop, and then calls them. .has_key()
和.append()
方法,然后调用它们。 This requires a stack push and pop for each call. The list comprehension collects all generated elements in a C array before creating the python list object in one operation. 列表推导在一次操作中创建python列表对象之前收集C数组中的所有生成元素。
The two functions do produce the same result, one is just needlessly slower. 这两个函数确实产生相同的结果,一个是不必要的慢。
If you want to go into the nitty gritty details, take a look at the bytecode disassembly using the dis
module: 如果您想了解详细信息,请使用
dis
模块查看字节码反汇编:
>>> dis.dis(simpleway)
2 0 BUILD_LIST 0
3 STORE_FAST 0 (result)
3 6 SETUP_LOOP 51 (to 60)
9 LOAD_GLOBAL 0 (to500)
12 LOAD_ATTR 1 (keys)
15 CALL_FUNCTION 0
18 GET_ITER
>> 19 FOR_ITER 37 (to 59)
22 STORE_FAST 1 (k)
4 25 LOAD_GLOBAL 2 (evens)
28 LOAD_ATTR 3 (has_key)
31 LOAD_FAST 1 (k)
34 CALL_FUNCTION 1
37 POP_JUMP_IF_FALSE 19
5 40 LOAD_FAST 0 (result)
43 LOAD_ATTR 4 (append)
46 LOAD_FAST 1 (k)
49 CALL_FUNCTION 1
52 POP_TOP
53 JUMP_ABSOLUTE 19
56 JUMP_ABSOLUTE 19
>> 59 POP_BLOCK
6 >> 60 LOAD_FAST 0 (result)
63 RETURN_VALUE
>>> dis.dis(pythonicsimpleway)
2 0 BUILD_LIST 0
3 LOAD_GLOBAL 0 (to500)
6 GET_ITER
>> 7 FOR_ITER 24 (to 34)
10 STORE_FAST 0 (k)
13 LOAD_FAST 0 (k)
16 LOAD_GLOBAL 1 (evens)
19 COMPARE_OP 6 (in)
22 POP_JUMP_IF_FALSE 7
25 LOAD_FAST 0 (k)
28 LIST_APPEND 2
31 JUMP_ABSOLUTE 7
>> 34 RETURN_VALUE
The number of bytecode instructions per iteration is much larger for the explicit for loop. 对于显式for循环, 每次迭代的字节码指令的数量要大得多。 The
simpleway
loop has to execute 11 instructions per iteration (if .has_key()
is True), vs. 7 for the list comprehension, where the extra instructions mostly cover LOAD_ATTR
and CALL_FUNCTION
. simpleway
循环必须每次迭代执行11条指令(如果.has_key()
为True),而列表理解则为7条,其中额外的指令主要涵盖LOAD_ATTR
和CALL_FUNCTION
。
If you want to make the first version faster, replace .has_key()
with an in
test, loop directly over the dictionary and cache the .append()
attribute in a local variable: 如果要更快地创建第一个版本,请使用
in
test替换.has_key()
,直接在字典上循环并将.append()
属性缓存在局部变量中:
def simpleway_optimized():
result = []
append = result.append
for k in to500:
if k in evens:
append(k)
return result
Then use the timeit
module to test timings properly (repeated runs, most accurate timer for your platform): 然后使用
timeit
模块正确测试时序(重复运行,为您的平台最精确的计时器):
>>> timeit('f()', 'from __main__ import evens, to500, simpleway as f', number=10000)
1.1673870086669922
>>> timeit('f()', 'from __main__ import evens, to500, pythonicsimpleway as f', number=10000)
0.5441269874572754
>>> timeit('f()', 'from __main__ import evens, to500, simpleway_optimized as f', number=10000)
0.6551430225372314
Here simpleway_optimized
is approaching the list comprehension method in speed, but the latter still can win by building the python list object in one step. 这里
simpleway_optimized
正在接近列表理解方法的速度,但后者仍然可以通过一步构建python列表对象来获胜。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.