列表理解與生成器表達式的奇怪 timeit 結果？

Question

我正在回答這個問題，我更喜歡這里的生成器表達式並使用它，我認為它會更快，因為生成器不需要先創建整個列表：

>>> lis=[['a','b','c'],['d','e','f']]
>>> 'd' in (y for x in lis for y in x)
True

Levon 在他的解決方案中使用了列表理解，

>>> lis = [['a','b','c'],['d','e','f']]
>>> 'd' in [j for i in mylist for j in i]
True

但是當我做這些 LC 的 timeit 結果時，它比生成器快：

~$ python -m timeit -s "lis=[['a','b','c'],['d','e','f']]" "'d' in (y for x in lis for y in x)"
    100000 loops, best of 3: 2.36 usec per loop
~$ python -m timeit -s "lis=[['a','b','c'],['d','e','f']]" "'d' in [y for x in lis for y in x]"
    100000 loops, best of 3: 1.51 usec per loop

然后我增加了列表的大小，並再次計時：

lis=[['a','b','c'],['d','e','f'],[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]]

這次搜索'd'生成器比 LC 快，但是當我搜索中間元素（11）和最后一個元素時，LC 再次擊敗生成器表達式，我不明白為什么？

~$ python -m timeit -s "lis=[['a','b','c'],['d','e','f'],[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]]" "'d' in (y for x in lis for y in x)"
    100000 loops, best of 3: 2.96 usec per loop

~$ python -m timeit -s "lis=[['a','b','c'],['d','e','f'],[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]]" "'d' in [y for x in lis for y in x]"
    100000 loops, best of 3: 7.4 usec per loop

~$ python -m timeit -s "lis=[['a','b','c'],['d','e','f'],[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]]" "11 in [y for x in lis for y in x]"
100000 loops, best of 3: 5.61 usec per loop

~$ python -m timeit -s "lis=[['a','b','c'],['d','e','f'],[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]]" "11 in (y for x in lis for y in x)"
100000 loops, best of 3: 9.76 usec per loop

~$ python -m timeit -s "lis=[['a','b','c'],['d','e','f'],[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]]" "18 in (y for x in lis for y in x)"
100000 loops, best of 3: 8.94 usec per loop

~$ python -m timeit -s "lis=[['a','b','c'],['d','e','f'],[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]]" "18 in [y for x in lis for y in x]"
100000 loops, best of 3: 7.13 usec per loop

Answer 1

擴展Paulo的答案，由於函數調用的開銷，生成器表達式通常比列表推導慢。 在這種情況下，如果該項目被發現得相當早，則in的短路行為會抵消這種緩慢，但否則，該模式成立。

我通過分析器運行了一個簡單的腳本以進行更詳細的分析。 這是腳本：

lis=[['a','b','c'],['d','e','f'],[1,2,3],[4,5,6],
     [7,8,9],[10,11,12],[13,14,15],[16,17,18]]

def ge_d():
    return 'd' in (y for x in lis for y in x)
def lc_d():
    return 'd' in [y for x in lis for y in x]

def ge_11():
    return 11 in (y for x in lis for y in x)
def lc_11():
    return 11 in [y for x in lis for y in x]

def ge_18():
    return 18 in (y for x in lis for y in x)
def lc_18():
    return 18 in [y for x in lis for y in x]

for i in xrange(100000):
    ge_d()
    lc_d()
    ge_11()
    lc_11()
    ge_18()
    lc_18()

以下是相關結果，重新排序以使模式更清晰。

         5400002 function calls in 2.830 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   100000    0.158    0.000    0.251    0.000 fop.py:3(ge_d)
   500000    0.092    0.000    0.092    0.000 fop.py:4(<genexpr>)
   100000    0.285    0.000    0.285    0.000 fop.py:5(lc_d)

   100000    0.356    0.000    0.634    0.000 fop.py:8(ge_11)
  1800000    0.278    0.000    0.278    0.000 fop.py:9(<genexpr>)
   100000    0.333    0.000    0.333    0.000 fop.py:10(lc_11)

   100000    0.435    0.000    0.806    0.000 fop.py:13(ge_18)
  2500000    0.371    0.000    0.371    0.000 fop.py:14(<genexpr>)
   100000    0.344    0.000    0.344    0.000 fop.py:15(lc_18)

創建生成器表達式相當於創建一個生成器函數並調用它。 這說明了對<genexpr>一次調用。 然后，在第一種情況下， next被調用 4 次，直到達到d ，總共調用 5 次（100000 次迭代 = ncalls = 500000）。 第二種情況，調用了17次，總共調用了18次； 第三次，24次，共25次。

在第一種情況下，genex 優於列表理解，但是對next的額外調用解釋了列表理解的速度與生成器表達式的速度在第二種和第三種情況下的速度之間的大部分差異。

>>> .634 - .278 - .333
0.023
>>> .806 - .371 - .344
0.091

我不確定剩下的時間是什么； 即使沒有額外的函數調用，生成器表達式似乎也會慢一點。 我想這證實了inspectorG4dget的斷言，即“創建生成器理解比列表理解具有更多的本機開銷。” 但無論如何，這很清楚地表明生成器表達式變慢主要是因為調用next 。

我要補充一點，當短路沒有幫助時，列表理解仍然更快，即使對於非常大的列表也是如此。 例如：

>>> counter = itertools.count()
>>> lol = [[counter.next(), counter.next(), counter.next()] 
           for _ in range(1000000)]
>>> 2999999 in (i for sublist in lol for i in sublist)
True
>>> 3000000 in (i for sublist in lol for i in sublist)
False
>>> %timeit 2999999 in [i for sublist in lol for i in sublist]
1 loops, best of 3: 312 ms per loop
>>> %timeit 2999999 in (i for sublist in lol for i in sublist)
1 loops, best of 3: 351 ms per loop
>>> %timeit any([2999999 in sublist for sublist in lol])
10 loops, best of 3: 161 ms per loop
>>> %timeit any(2999999 in sublist for sublist in lol)
10 loops, best of 3: 163 ms per loop
>>> %timeit for i in [2999999 in sublist for sublist in lol]: pass
1 loops, best of 3: 171 ms per loop
>>> %timeit for i in (2999999 in sublist for sublist in lol): pass
1 loops, best of 3: 183 ms per loop

如您所見，當短路無關緊要時，即使對於一百萬個項目長的列表列表，列表理解也始終更快。 顯然，對於這些規模的in實際使用，由於短路，發電機會更快。 但是對於其他類型的迭代任務，在項目數量上確實是線性的，列表推導幾乎總是更快。 如果您需要對列表執行多個測試，則尤其如此； 您可以非常快速地迭代已經構建的列表理解：

>>> incache = [2999999 in sublist for sublist in lol]
>>> get_list = lambda: incache
>>> get_gen = lambda: (2999999 in sublist for sublist in lol)
>>> %timeit for i in get_list(): pass
100 loops, best of 3: 18.6 ms per loop
>>> %timeit for i in get_gen(): pass
1 loops, best of 3: 187 ms per loop

在這種情況下，列表理解要快一個數量級！

當然，這只會在您耗盡內存之前保持正確。 這讓我想到了最后一點。 使用生成器有兩個主要原因：利用短路和節省內存。 對於非常大的序列/可迭代對象，生成器是顯而易見的方法，因為它們可以節省內存。 但是，如果短路不是一種選擇，那么您幾乎永遠不會為了速度而選擇生成器而不是列表。 您選擇它們是為了節省內存，這始終是一種權衡。

Answer 2

完全取決於數據。

生成器有一個固定的設置時間，必須根據調用的項目數量來分攤； 列表推導式最初更快，但隨着更大的數據集使用更多的內存，速度會大大減慢。

回想一下，作為CPython的列表被擴大，該列表中的生長模式調整大小4，8，16，25，35，46，58，72，88，... 。 對於更大的列表推導式，Python 可能會分配比數據大小多 4 倍的內存。 一旦你點擊了 VM --- 真的 sloowww！ 但是，如前所述，對於小數據集，列表理解比生成器要快。

考慮案例 1 ，一個 2x26 的列表列表：

LoL=[[c1,c2] for c1,c2 in zip(string.ascii_lowercase,string.ascii_uppercase)]  

def lc_d(item='d'):
    return item in [i for sub in LoL for i in sub]

def ge_d(item='d'):
    return item in (y for x in LoL for y in x)    

def any_lc_d(item='d'):
    return any(item in x for x in LoL)    

def any_gc_d(item='d'):
    return any([item in x for x in LoL])     

def lc_z(item='z'):
    return item in [i for sub in LoL for i in sub]

def ge_z(item='z'):
    return item in (y for x in LoL for y in x)    

def any_lc_z(item='z'):
    return any(item in x for x in LoL)    

def any_gc_z(item='z'):
    return any([item in x for x in LoL])               

cmpthese.cmpthese([lc_d,ge_d,any_gc_d,any_gc_z,any_lc_d,any_lc_z, lc_z, ge_z])

這些時間的結果：

         rate/sec   ge_z   lc_z   lc_d any_lc_z any_gc_z any_gc_d   ge_d any_lc_d
ge_z      124,652     -- -10.1% -16.6%   -44.3%   -46.5%   -48.5% -76.9%   -80.7%
lc_z      138,678  11.3%     --  -7.2%   -38.0%   -40.4%   -42.7% -74.3%   -78.6%
lc_d      149,407  19.9%   7.7%     --   -33.3%   -35.8%   -38.2% -72.3%   -76.9%
any_lc_z  223,845  79.6%  61.4%  49.8%       --    -3.9%    -7.5% -58.5%   -65.4%
any_gc_z  232,847  86.8%  67.9%  55.8%     4.0%       --    -3.7% -56.9%   -64.0%
any_gc_d  241,890  94.1%  74.4%  61.9%     8.1%     3.9%       -- -55.2%   -62.6%
ge_d      539,654 332.9% 289.1% 261.2%   141.1%   131.8%   123.1%     --   -16.6%
any_lc_d  647,089 419.1% 366.6% 333.1%   189.1%   177.9%   167.5%  19.9%       --

現在考慮案例 2 ，它顯示了 LC 和 gen 之間的巨大差異。 在這種情況下，我們正在尋找一個 100 x 97 x 97 列表類型結構的列表中的一個元素：

LoL=[[str(a),str(b),str(c)] 
       for a in range(100) for b in range(97) for c in range(97)]

def lc_10(item='10'):
    return item in [i for sub in LoL for i in sub]

def ge_10(item='10'):
    return item in (y for x in LoL for y in x)    

def any_lc_10(item='10'):
    return any([item in x for x in LoL])    

def any_gc_10(item='10'):
    return any(item in x for x in LoL)     

def lc_99(item='99'):
    return item in [i for sub in LoL for i in sub]

def ge_99(item='99'):
    return item in (y for x in LoL for y in x)    

def any_lc_99(item='99'):
    return any(item in x for x in LoL)    

def any_gc_99(item='99'):
    return any([item in x for x in LoL])      

cmpthese.cmpthese([lc_10,ge_10,any_lc_10,any_gc_10,lc_99,ge_99,any_lc_99,any_gc_99],c=10,micro=True)

這些時間的結果：

          rate/sec  usec/pass       ge_99      lc_99      lc_10  any_lc_99  any_gc_99  any_lc_10   ge_10 any_gc_10
ge_99            3 354545.903          --     -20.6%     -30.6%     -60.8%     -61.7%     -63.5% -100.0%   -100.0%
lc_99            4 281678.295       25.9%         --     -12.6%     -50.6%     -51.8%     -54.1% -100.0%   -100.0%
lc_10            4 246073.484       44.1%      14.5%         --     -43.5%     -44.8%     -47.4% -100.0%   -100.0%
any_lc_99        7 139067.292      154.9%     102.5%      76.9%         --      -2.4%      -7.0% -100.0%   -100.0%
any_gc_99        7 135748.100      161.2%     107.5%      81.3%       2.4%         --      -4.7% -100.0%   -100.0%
any_lc_10        8 129331.803      174.1%     117.8%      90.3%       7.5%       5.0%         -- -100.0%   -100.0%
ge_10      175,494      5.698  6221964.0% 4943182.0% 4318339.3% 2440446.0% 2382196.2% 2269594.1%      --    -38.5%
any_gc_10  285,327      3.505 10116044.9% 8036936.7% 7021036.1% 3967862.6% 3873157.1% 3690083.0%   62.6%        --

正如你所看到的——這取決於，這是一個權衡......

Answer 3

與流行的看法相反，列表推導式對於中等范圍非常好。 迭代器協議意味着對iterator.__next__()調用，而 Python 中的函數調用 - 說實話 - 非常昂貴。

當然，在某些時候，生成器的內存/cpu 權衡將開始付出代價，但對於小集合，列表理解非常有效。

列表理解與生成器表達式的奇怪 timeit 結果？

問題描述

3 個解決方案

解決方案1
39 已采納 2012-08-15 05:21:44

解決方案2
13 2012-08-15 04:54:37

解決方案3
10 2012-08-15 04:33:05

列表理解與生成器表達式的奇怪 timeit 結果？

問題描述

3 個解決方案

解決方案1 39 已采納 2012-08-15 05:21:44

解決方案2 13 2012-08-15 04:54:37

解決方案3 10 2012-08-15 04:33:05

解決方案1
39 已采納 2012-08-15 05:21:44

解決方案2
13 2012-08-15 04:54:37

解決方案3
10 2012-08-15 04:33:05