简体   繁体   English

Python的[ <generator expression> ]比列表快至少3倍( <generator expression> )?

[英]Python's [<generator expression>] at least 3x faster than list(<generator expression>)?

It appears that using [] around a generator expression (test1) behaves substantially better than putting it inside of list() (test2). 似乎在生成器表达式(test1)周围使用[]表现得比将它放在list()(test2)中要好得多。 The slowdown isn't there when I simply pass a list into list() for shallow copy (test3). 当我只是将列表传递给list()以进行浅拷贝(test3)时,速度就不存在了。 Why is this? 为什么是这样?

Evidence: 证据:

from timeit import Timer

t1 = Timer("test1()", "from __main__ import test1")
t2 = Timer("test2()", "from __main__ import test2")
t3 = Timer("test3()", "from __main__ import test3")

x = [34534534, 23423523, 77645645, 345346]

def test1():
    [e for e in x]

print t1.timeit()
#0.552290201187


def test2():
    list(e for e in x)

print t2.timeit()
#2.38739395142

def test3():
    list(x)

print t3.timeit()
#0.515818119049

Machine: 64 bit AMD, Ubuntu 8.04, Python 2.7 (r27:82500) 机器:64位AMD,Ubuntu 8.04,Python 2.7(r27:82500)

Well, my first step was to set the two tests up independently to ensure that this is not a result of eg the order in which the functions are defined. 好吧,我的第一步是独立设置两个测试,以确保这不是例如定义函数的顺序的结果。

>python -mtimeit "x=[34534534, 23423523, 77645645, 345346]" "[e for e in x]"
1000000 loops, best of 3: 0.638 usec per loop

>python -mtimeit "x=[34534534, 23423523, 77645645, 345346]" "list(e for e in x)"
1000000 loops, best of 3: 1.72 usec per loop

Sure enough, I can replicate this. 果然,我可以复制这个。 OK, next step is to have a look at the bytecode to see what's actually going on: 好的,下一步是查看字节码,看看实际发生了什么:

>>> import dis
>>> x=[34534534, 23423523, 77645645, 345346]
>>> dis.dis(lambda: [e for e in x])
  1           0 LOAD_CONST               0 (<code object <listcomp> at 0x0000000001F8B330, file "<stdin>", line 1>)
              3 MAKE_FUNCTION            0
              6 LOAD_GLOBAL              0 (x)
              9 GET_ITER
             10 CALL_FUNCTION            1
             13 RETURN_VALUE
>>> dis.dis(lambda: list(e for e in x))
  1           0 LOAD_GLOBAL              0 (list)
              3 LOAD_CONST               0 (<code object <genexpr> at 0x0000000001F8B9B0, file "<stdin>", line 1>)
              6 MAKE_FUNCTION            0
              9 LOAD_GLOBAL              1 (x)
             12 GET_ITER
             13 CALL_FUNCTION            1
             16 CALL_FUNCTION            1
             19 RETURN_VALUE

Notice that the first method creates the list directly, whereas the second method creates a genexpr object and passes that to the global list . 请注意,第一个方法直接创建列表,而第二个方法创建genexpr对象并将其传递给全局list This is probably where the overhead lies. 这可能是开销所在。

Note also that the difference is approximately a microsecond ie utterly trivial. 还要注意,差异大约是一微秒,即完全无关紧要。


Other interesting data 其他有趣的数据

This still holds for non-trivial lists 这仍然适用于非平凡的列表

>python -mtimeit "x=range(100000)" "[e for e in x]"
100 loops, best of 3: 8.51 msec per loop

>python -mtimeit "x=range(100000)" "list(e for e in x)"
100 loops, best of 3: 11.8 msec per loop

and for less trivial map functions: 对于不那么简单的地图功能:

>python -mtimeit "x=range(100000)" "[2*e for e in x]"
100 loops, best of 3: 12.8 msec per loop

>python -mtimeit "x=range(100000)" "list(2*e for e in x)"
100 loops, best of 3: 16.8 msec per loop

and (though less strongly) if we filter the list: 和(虽然不太强烈)如果我们过滤列表:

>python -mtimeit "x=range(100000)" "[e for e in x if e%2]"
100 loops, best of 3: 14 msec per loop

>python -mtimeit "x=range(100000)" "list(e for e in x if e%2)"
100 loops, best of 3: 16.5 msec per loop

list(e for e in x) isn't a list comprehension, it's a genexpr object (e for e in x) being created and passed to the list factory function. list(e for e in x)不是列表genexpr ,它是一个genexpr对象(e for e in x)被创建并传递给list工厂函数。 Presumably the object creation and method calls create overhead. 据推测,对象创建和方法调用会产生开销。

In python list name must be looked up in the module and then in builtins. 在python list必须在模块中查找名称,然后在内置中查找。 While you cannot change what a list comprehension means a list call must just be a standard lookup + function call as it could be redefined to be something else. 虽然你不能改变列表理解意味着列表调用必须只是标准的查找+函数调用,因为它可以被重新定义为其他东西。

Looking at the vm code generated for a comprehension it can be seen that it is inlined while a call to list is a normal call. 查看为理解而生成的vm代码,可以看出,当列表调用是普通调用时,它是内联的。

>>> import dis
>>> def foo():
...     [x for x in xrange(4)]
... 
>>> dis.dis(foo)
  2           0 BUILD_LIST               0
              3 DUP_TOP             
              4 STORE_FAST               0 (_[1])
              7 LOAD_GLOBAL              0 (xrange)
             10 LOAD_CONST               1 (4)
             13 CALL_FUNCTION            1
             16 GET_ITER            
        >>   17 FOR_ITER                13 (to 33)
             20 STORE_FAST               1 (x)
             23 LOAD_FAST                0 (_[1])
             26 LOAD_FAST                1 (x)
             29 LIST_APPEND         
             30 JUMP_ABSOLUTE           17
        >>   33 DELETE_FAST              0 (_[1])
             36 POP_TOP             
             37 LOAD_CONST               0 (None)
             40 RETURN_VALUE        

>>> def bar():
...     list(x for x in xrange(4))
... 
>>> dis.dis(bar)
  2           0 LOAD_GLOBAL              0 (list)
              3 LOAD_CONST               1 (<code object <genexpr> at 0x7fd1230cf468, file "<stdin>", line 2>)
              6 MAKE_FUNCTION            0
              9 LOAD_GLOBAL              1 (xrange)
             12 LOAD_CONST               2 (4)
             15 CALL_FUNCTION            1
             18 GET_ITER            
             19 CALL_FUNCTION            1
             22 CALL_FUNCTION            1
             25 POP_TOP             
             26 LOAD_CONST               0 (None)
             29 RETURN_VALUE  

Your test2 is roughly equivalent to: 您的test2大致相当于:

def test2():
    def local():
        for i in x:
            yield i
    return list(local())

The call overhead explains the increased processing time. 呼叫开销解释了增加的处理时间。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM