為什么帶有列表理解語句的函數比列表理解語句快？

Question

我最近遇到了這種行為，我對它為什么會發生有點困惑 - 我最初的假設是調用函數而不是運行語句時正在進行這種優化。

例子：讓我們從一個簡單的例子開始：

somestring="climate change is a big problem. However emissions are still rising"
sometopics=["climate","change","problem","big","rising"]

假設我們有一個字符串列表，類似於上面的“ somestring ”，我們還有一個主題列表，比如 sometopics。

我們想比較“somestring”中是否存在任何“sometopics”，重要的是將那些“sometopics”返回到新列表中。

使用列表理解語句，我們可以對一個字符串執行以下操作：

result = [element for element in sometopic if(element in somestring)]

然而，在我的機器上，如下函數定義的運行速度比上面的語句快20-30%。

def comparelistoftopicstokw(mystring,somelistoftopics):
   result = [element for element in somelistoftopics if(element in mystring)]
   return result

為什么會發生這種情況？

函數總是比等效的語句/語句列表更快嗎？

編輯****

請參閱下面的最小可行的可復制筆記本示例：

import pandas as pd, numpy as np

columns_df = pd.DataFrame({"Keyword":['fish soup','katsu','soup']}) # Compute a Pandas dataframe to write into 500kcolumns
somestring="pad thai is a good recipe. It is cooked with chicken or lamb or beef"
sometopics=["chicken","pad thai","recipe","lamb","beef"]
print(len(sometopics))
somebigtopics=sometopics*100000


def extractsubstrings(inputstring,alistofpossibletopics):
    #obvious very slow for loop
    topicslist=[]
    print(inputstring)
    for topic in alistofpossibletopics:
        if str(topic) in inputstring:
            topicslist.append(str(topic))

%%time
def listcompinlists(mystring,bigtopic):
    res = [ele for ele in bigtopic if(ele in mystring)] 
    return res

%%time
res = [ele for ele in somebigtopics if(ele in somestring)] 

%%time
x=extractsubstrings(somestring,somebigtopics)

%%time
funcres=listcompinlists(somestring,somebigtopics)

在我的機器（ubuntu 18.04，Python 3.6）上，上述情況的列表推導在 22-24 毫秒內執行，而函數在 18-21 毫秒內執行。 這不是一個巨大的差異，但是如果你有 1000 萬行要處理，例如那可以節省幾個小時

TLDR Performance comparison:

extractsubstrings=Wall time: 122 ms
list comprehension statement: Wall time: 24.5 ms
listcompinlists=Wall time: 18.6 ms

Answer 1

我無法重現您所聲稱的內容。 你能提供任何測量來證明你的斷言嗎？

我創建了這個度量來比較執行時間：

import time

N = 1000000

def comparelistoftopicstokw(mystring,somelistoftopics):
   result = [element for element in somelistoftopics if(element in mystring)]
   return result
   
somestring="climate change is a big problem. However emissions are still rising"
sometopics=["climate","change","problem","big","rising"]

start = time.time()
for _ in range(N):
    result = [element for element in sometopics if(element in somestring)]
end = time.time()
print(f'Time using list comprehension: {end - start}')
   
start = time.time()
for _ in range(N):
    result = comparelistoftopicstokw(somestring, sometopics)
end = time.time()
print(f'Time using function: {end - start}')

輸出

Time using list comprehension: 0.9571423530578613
Time using function: 1.1152479648590088

所以在我的情況下，列表理解平均速度更快。

Answer 2

我無法回答你的問題，但我做了一個小測試，質疑它的基礎。

正如我們可以從輸出中推斷出的那樣，結果是非常隨機的，在某些情況下，一個平均比另一個快，情況相反

import time
import statistics

somestring="climate change is a big problem. However emissions are still rising"
sometopics=["climate","change","problem","big","rising"]


def comparelistoftopicstokw(mystring,somelistoftopics):
   result = [element for element in somelistoftopics if(element in mystring)]
   return result

for i in range(10):
    print(f"Average time to execute 1 iteration (100000 iterations). Round {i + 1}:")
    time1average = []
    for i in range(100000):
        start1 = time.time()
        result = [element for element in sometopics if(element in somestring)]
        time1average.append(time.time() - start1)
        
    print(statistics.mean(time1average))
    
    time2average = []
    for i in range(100000):
        start2 = time.time()
        comparelistoftopicstokw(somestring,sometopics)
        time2average.append(time.time() - start2)
    
    print(statistics.mean(time2average))
    print("")

輸出：

Average time to execute 1 iteration (100000 iterations). Round 1:
3.879823684692383e-06
5.041525363922119e-06

Average time to execute 1 iteration (100000 iterations). Round 2:
4.478754997253418e-06
5.097501277923584e-06

Average time to execute 1 iteration (100000 iterations). Round 3:
3.9185094833374025e-06
4.177823066711426e-06

Average time to execute 1 iteration (100000 iterations). Round 4:
4.212841987609863e-06
4.6886253356933596e-06

Average time to execute 1 iteration (100000 iterations). Round 5:
3.580739498138428e-06
3.840360641479492e-06

Average time to execute 1 iteration (100000 iterations). Round 6:
3.070487976074219e-06
4.423313140869141e-06

Average time to execute 1 iteration (100000 iterations). Round 7:
3.0085206031799318e-06
3.401658535003662e-06

Average time to execute 1 iteration (100000 iterations). Round 8:
2.937157154083252e-06
4.46035623550415e-06

Average time to execute 1 iteration (100000 iterations). Round 9:
3.5696911811828613e-06
3.5602593421936035e-06

Average time to execute 1 iteration (100000 iterations). Round 10:
2.7422666549682615e-06
3.158261775970459e-06

為什么帶有列表理解語句的函數比列表理解語句快？

問題描述

2 個解決方案

解決方案1
0 2020-11-19 15:01:20

輸出

解決方案2
0 2020-11-19 15:04:42

為什么帶有列表理解語句的函數比列表理解語句*快*？

問題描述

2 個解決方案

解決方案1 0 2020-11-19 15:01:20

輸出

解決方案2 0 2020-11-19 15:04:42

為什么帶有列表理解語句的函數比列表理解語句快？

解決方案1
0 2020-11-19 15:01:20

解決方案2
0 2020-11-19 15:04:42