为什么itertools.chain比扁平化列表理解更快？

Question

In the context of a discussion in the comments of this question it was mentioned that while concatenating a sequence of strings simply takes ''.join([str1, str2, ...]) , concatenating a sequence of lists would be something like list(itertools.chain(lst1, lst2, ...)) , although you can also use a list comprehension like [x for y in [lst1, lst2, ...] for x in y] . 在这个问题的评论中讨论的上下文中提到，虽然连接一个字符串序列只需要''.join([str1, str2, ...]) ，但连接一系列列表就像list(itertools.chain(lst1, lst2, ...))一样list(itertools.chain(lst1, lst2, ...)) ，虽然你也可以使用[x for y in [lst1, lst2, ...] for x in y]的列表理解。 What surprised me is that the first method is consistently faster than the second: 让我感到惊讶的是，第一种方法始终比第二种方法更快：

import random
import itertools

random.seed(100)
lsts = [[1] * random.randint(100, 1000) for i in range(1000)]

%timeit [x for y in lsts for x in y]
# 39.3 ms ± 436 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit list(itertools.chain.from_iterable(lsts))
# 30.6 ms ± 866 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit list(x for y in lsts for x in y)  # Proposed in comments
# 62.5 ms ± 504 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
# Loop-based methods proposed in the comments
%%timeit
a = []
for lst in lsts: a += lst
# 26.4 ms ± 634 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%%timeit
a = []
for lst in lsts: a.extend(lst)
# 26.7 ms ± 728 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

It is not a difference of orders of magnitude, but it is not negligible either. 它不是数量级的差异，但也不可忽略。 I was wondering how that might be the case, since list comprehensions are frequently among the fastest methods to solve a given problem. 我想知道可能是这种情况，因为列表推导通常是解决给定问题的最快方法之一。 At first I thought that maybe the itertools.chain object would have a len that the list constructor could use to preallocate the necessary memory, but that is not the case (cannot call len on itertools.chain objects). 起初我认为也许itertools.chain对象可能有一个len list构造函数可以用来预分配必要的内存，但事实并非如此（不能在itertools.chain对象上调用len ）。 Is some custom itertools.chain -to- list conversion taking place somehow or is itertools.chain taking advantage of some other mechanism? 一些自定义的itertools.chain -to- list转换是以某种方式发生的还是itertools.chain利用其他一些机制？

Tested in Python 3.6.3 on Windows 10 x64, if that is relevant. 在Windows 10 x64上测试Python 3.6.3，如果相关的话。

EDIT: 编辑：

It seems the fastest method after all is calling .extend ing an empty list with each list, as proposed by @zwer , probably because it works on "chunks" of data instead of on a per-element basis. 这似乎是最快的方法，毕竟调用。 .extend每个列表的空列表，如@zwer所提议的，可能是因为它适用于数据的“块”而不是基于每个元素。

Answer 1

Here is itertools.chain.from_iterable . 这是itertools.chain.from_iterable 。 It's not hard to read even if you don't know C and you can tell everything is happening at the c level (before being used to generate a list in your code). 即使您不了解C也不难阅读，并且您可以告诉所有事情都发生在c级别（在用于在代码中生成列表之前）。

The bytecode for list comprehensions is like so: 列表推导的字节码如下：

def f(lsts):
    return [x for y in lsts for x in y]

dis.dis(f.__code__.co_consts[1])
  2           0 BUILD_LIST               0
              2 LOAD_FAST                0 (.0)
        >>    4 FOR_ITER                18 (to 24)
              6 STORE_FAST               1 (y)
              8 LOAD_FAST                1 (y)
             10 GET_ITER
        >>   12 FOR_ITER                 8 (to 22)
             14 STORE_FAST               2 (x)
             16 LOAD_FAST                2 (x)
             18 LIST_APPEND              3
             20 JUMP_ABSOLUTE           12
        >>   22 JUMP_ABSOLUTE            4
        >>   24 RETURN_VALUE

These are all the python interpreter operations involved in creating a list comprehension. 这些是创建列表理解所涉及的所有python解释器操作。 Just having all the operations at the C level (in chain ) rather than having the interpreter step over each byte code step (in the comprehension) is what will give you that performance boost. 只需要在C级（ chain ）执行所有操作，而不是让解释器跨过每个字节代码步骤（在理解中），这将为您提供性能提升。

Still, that boost is so small I wouldn't worry about it. 不过，这种提升是如此之小，我不担心。 This is python, readability over speed. 这是python，可读性超速。

Edit: 编辑：

For a list wrapped generator comprehension 对于列表包裹的生成器理解

def g(lists):
    return list(x for y in lsts for x in y)

# the comprehension
dis.dis(g.__code__.co_consts[1])
  2           0 LOAD_FAST                0 (.0)
        >>    2 FOR_ITER                20 (to 24)
              4 STORE_FAST               1 (y)
              6 LOAD_FAST                1 (y)
              8 GET_ITER
        >>   10 FOR_ITER                10 (to 22)
             12 STORE_FAST               2 (x)
             14 LOAD_FAST                2 (x)
             16 YIELD_VALUE
             18 POP_TOP
             20 JUMP_ABSOLUTE           10
        >>   22 JUMP_ABSOLUTE            2
        >>   24 LOAD_CONST               0 (None)
             26 RETURN_VALUE

So the interpreter has a similar number of steps to go to in running the generator expression being unpacked by list, but as you would expect, the python level overhead of having list unpack a generator (as opposed to the C LIST_APPEND instruction) is what slows it down. 因此，解释器在运行由列表解压缩的生成器表达式时有相似的步骤数，但正如您所料，使list解包生成器（而不是C LIST_APPEND指令）的python级开销是什么减慢它失败了。

dis.dis(f)
  2           0 LOAD_CONST               1 (<code object <listcomp> at 0x000000000FB58B70, file "<ipython-input-33-1d46ced34d66>", line 2>)
              2 LOAD_CONST               2 ('f.<locals>.<listcomp>')
              4 MAKE_FUNCTION            0
              6 LOAD_FAST                0 (lsts)
              8 GET_ITER
             10 CALL_FUNCTION            1
             12 RETURN_VALUE

dis.dis(g)
  2           0 LOAD_GLOBAL              0 (list)
              2 LOAD_CONST               1 (<code object <genexpr> at 0x000000000FF6F420, file "<ipython-input-40-0334a7cdeb8f>", line 2>)
              4 LOAD_CONST               2 ('g.<locals>.<genexpr>')
              6 MAKE_FUNCTION            0
              8 LOAD_GLOBAL              1 (lsts)
             10 GET_ITER
             12 CALL_FUNCTION            1
             14 CALL_FUNCTION            1
             16 RETURN_VALUE

为什么itertools.chain比扁平化列表理解更快？

问题描述

1 个解决方案

解决方案1
4 已采纳 2018-04-03 14:13:38

为什么itertools.chain比扁平化列表理解更快？

问题描述

1 个解决方案

解决方案1 4 已采纳 2018-04-03 14:13:38

解决方案1
4 已采纳 2018-04-03 14:13:38