[英]why is function with a list comprehension statement *faster* than the list comprehension statement?
我最近遇到了這種行為,我對它為什么會發生有點困惑 - 我最初的假設是調用函數而不是運行語句時正在進行這種優化。
例子:讓我們從一個簡單的例子開始:
somestring="climate change is a big problem. However emissions are still rising"
sometopics=["climate","change","problem","big","rising"]
假設我們有一個字符串列表,類似於上面的“ somestring ”,我們還有一個主題列表,比如 sometopics。
我們想比較“somestring”中是否存在任何“sometopics”,重要的是將那些“sometopics”返回到新列表中。
使用列表理解語句,我們可以對一個字符串執行以下操作:
result = [element for element in sometopic if(element in somestring)]
然而,在我的機器上,如下函數定義的運行速度比上面的語句快20-30%。
def comparelistoftopicstokw(mystring,somelistoftopics):
result = [element for element in somelistoftopics if(element in mystring)]
return result
為什么會發生這種情況?
函數總是比等效的語句/語句列表更快嗎?
編輯****
請參閱下面的最小可行的可復制筆記本示例:
import pandas as pd, numpy as np
columns_df = pd.DataFrame({"Keyword":['fish soup','katsu','soup']}) # Compute a Pandas dataframe to write into 500kcolumns
somestring="pad thai is a good recipe. It is cooked with chicken or lamb or beef"
sometopics=["chicken","pad thai","recipe","lamb","beef"]
print(len(sometopics))
somebigtopics=sometopics*100000
def extractsubstrings(inputstring,alistofpossibletopics):
#obvious very slow for loop
topicslist=[]
print(inputstring)
for topic in alistofpossibletopics:
if str(topic) in inputstring:
topicslist.append(str(topic))
%%time
def listcompinlists(mystring,bigtopic):
res = [ele for ele in bigtopic if(ele in mystring)]
return res
%%time
res = [ele for ele in somebigtopics if(ele in somestring)]
%%time
x=extractsubstrings(somestring,somebigtopics)
%%time
funcres=listcompinlists(somestring,somebigtopics)
在我的機器(ubuntu 18.04,Python 3.6)上,上述情況的列表推導在 22-24 毫秒內執行,而函數在 18-21 毫秒內執行。 這不是一個巨大的差異,但是如果你有 1000 萬行要處理,例如那可以節省幾個小時
TLDR Performance comparison:
extractsubstrings=Wall time: 122 ms
list comprehension statement: Wall time: 24.5 ms
listcompinlists=Wall time: 18.6 ms
我無法重現您所聲稱的內容。 你能提供任何測量來證明你的斷言嗎?
我創建了這個度量來比較執行時間:
import time
N = 1000000
def comparelistoftopicstokw(mystring,somelistoftopics):
result = [element for element in somelistoftopics if(element in mystring)]
return result
somestring="climate change is a big problem. However emissions are still rising"
sometopics=["climate","change","problem","big","rising"]
start = time.time()
for _ in range(N):
result = [element for element in sometopics if(element in somestring)]
end = time.time()
print(f'Time using list comprehension: {end - start}')
start = time.time()
for _ in range(N):
result = comparelistoftopicstokw(somestring, sometopics)
end = time.time()
print(f'Time using function: {end - start}')
Time using list comprehension: 0.9571423530578613
Time using function: 1.1152479648590088
所以在我的情況下,列表理解平均速度更快。
我無法回答你的問題,但我做了一個小測試,質疑它的基礎。
正如我們可以從輸出中推斷出的那樣,結果是非常隨機的,在某些情況下,一個平均比另一個快,情況相反
import time
import statistics
somestring="climate change is a big problem. However emissions are still rising"
sometopics=["climate","change","problem","big","rising"]
def comparelistoftopicstokw(mystring,somelistoftopics):
result = [element for element in somelistoftopics if(element in mystring)]
return result
for i in range(10):
print(f"Average time to execute 1 iteration (100000 iterations). Round {i + 1}:")
time1average = []
for i in range(100000):
start1 = time.time()
result = [element for element in sometopics if(element in somestring)]
time1average.append(time.time() - start1)
print(statistics.mean(time1average))
time2average = []
for i in range(100000):
start2 = time.time()
comparelistoftopicstokw(somestring,sometopics)
time2average.append(time.time() - start2)
print(statistics.mean(time2average))
print("")
輸出:
Average time to execute 1 iteration (100000 iterations). Round 1:
3.879823684692383e-06
5.041525363922119e-06
Average time to execute 1 iteration (100000 iterations). Round 2:
4.478754997253418e-06
5.097501277923584e-06
Average time to execute 1 iteration (100000 iterations). Round 3:
3.9185094833374025e-06
4.177823066711426e-06
Average time to execute 1 iteration (100000 iterations). Round 4:
4.212841987609863e-06
4.6886253356933596e-06
Average time to execute 1 iteration (100000 iterations). Round 5:
3.580739498138428e-06
3.840360641479492e-06
Average time to execute 1 iteration (100000 iterations). Round 6:
3.070487976074219e-06
4.423313140869141e-06
Average time to execute 1 iteration (100000 iterations). Round 7:
3.0085206031799318e-06
3.401658535003662e-06
Average time to execute 1 iteration (100000 iterations). Round 8:
2.937157154083252e-06
4.46035623550415e-06
Average time to execute 1 iteration (100000 iterations). Round 9:
3.5696911811828613e-06
3.5602593421936035e-06
Average time to execute 1 iteration (100000 iterations). Round 10:
2.7422666549682615e-06
3.158261775970459e-06
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.