簡體   English   中英

為什么帶有列表理解語句的函數比列表理解語句*快*?

[英]why is function with a list comprehension statement *faster* than the list comprehension statement?

我最近遇到了這種行為,我對它為什么會發生有點困惑 - 我最初的假設是調用函數而不是運行語句時正在進行這種優化。

例子:讓我們從一個簡單的例子開始:

somestring="climate change is a big problem. However emissions are still rising"
sometopics=["climate","change","problem","big","rising"]

假設我們有一個字符串列表,類似於上面的“ somestring ”,我們還有一個主題列表,比如 sometopics。

我們想比較“somestring”中是否存在任何“sometopics”,重要的是將那些“sometopics”返回到新列表中。

使用列表理解語句,我們可以對一個字符串執行以下操作:

result = [element for element in sometopic if(element in somestring)]

然而,在我的機器上,如下函數定義的運行速度比上面的語句20-30%。

def comparelistoftopicstokw(mystring,somelistoftopics):
   result = [element for element in somelistoftopics if(element in mystring)]
   return result

為什么會發生這種情況?

函數總是比等效的語句/語句列表更快嗎?

編輯****

請參閱下面的最小可行的可復制筆記本示例:

import pandas as pd, numpy as np

columns_df = pd.DataFrame({"Keyword":['fish soup','katsu','soup']}) # Compute a Pandas dataframe to write into 500kcolumns
somestring="pad thai is a good recipe. It is cooked with chicken or lamb or beef"
sometopics=["chicken","pad thai","recipe","lamb","beef"]
print(len(sometopics))
somebigtopics=sometopics*100000


def extractsubstrings(inputstring,alistofpossibletopics):
    #obvious very slow for loop
    topicslist=[]
    print(inputstring)
    for topic in alistofpossibletopics:
        if str(topic) in inputstring:
            topicslist.append(str(topic))

%%time
def listcompinlists(mystring,bigtopic):
    res = [ele for ele in bigtopic if(ele in mystring)] 
    return res

%%time
res = [ele for ele in somebigtopics if(ele in somestring)] 

%%time
x=extractsubstrings(somestring,somebigtopics)

%%time
funcres=listcompinlists(somestring,somebigtopics)

在我的機器(ubuntu 18.04,Python 3.6)上,上述情況的列表推導在 22-24 毫秒內執行,而函數在 18-21 毫秒內執行。 這不是一個巨大的差異,但是如果你有 1000 萬行要處理,例如那可以節省幾個小時

TLDR Performance comparison:

extractsubstrings=Wall time: 122 ms
list comprehension statement: Wall time: 24.5 ms
listcompinlists=Wall time: 18.6 ms

我無法重現您所聲稱的內容。 你能提供任何測量來證明你的斷言嗎?

我創建了這個度量來比較執行時間:

import time

N = 1000000

def comparelistoftopicstokw(mystring,somelistoftopics):
   result = [element for element in somelistoftopics if(element in mystring)]
   return result
   
somestring="climate change is a big problem. However emissions are still rising"
sometopics=["climate","change","problem","big","rising"]

start = time.time()
for _ in range(N):
    result = [element for element in sometopics if(element in somestring)]
end = time.time()
print(f'Time using list comprehension: {end - start}')
   
start = time.time()
for _ in range(N):
    result = comparelistoftopicstokw(somestring, sometopics)
end = time.time()
print(f'Time using function: {end - start}')

輸出

Time using list comprehension: 0.9571423530578613
Time using function: 1.1152479648590088

所以在我的情況下,列表理解平均速度更快。

我無法回答你的問題,但我做了一個小測試,質疑它的基礎。

正如我們可以從輸出中推斷出的那樣,結果是非常隨機的,在某些情況下,一個平均比另一個快,情況相反

import time
import statistics

somestring="climate change is a big problem. However emissions are still rising"
sometopics=["climate","change","problem","big","rising"]


def comparelistoftopicstokw(mystring,somelistoftopics):
   result = [element for element in somelistoftopics if(element in mystring)]
   return result

for i in range(10):
    print(f"Average time to execute 1 iteration (100000 iterations). Round {i + 1}:")
    time1average = []
    for i in range(100000):
        start1 = time.time()
        result = [element for element in sometopics if(element in somestring)]
        time1average.append(time.time() - start1)
        
    print(statistics.mean(time1average))
    
    time2average = []
    for i in range(100000):
        start2 = time.time()
        comparelistoftopicstokw(somestring,sometopics)
        time2average.append(time.time() - start2)
    
    print(statistics.mean(time2average))
    print("")

輸出:

Average time to execute 1 iteration (100000 iterations). Round 1:
3.879823684692383e-06
5.041525363922119e-06

Average time to execute 1 iteration (100000 iterations). Round 2:
4.478754997253418e-06
5.097501277923584e-06

Average time to execute 1 iteration (100000 iterations). Round 3:
3.9185094833374025e-06
4.177823066711426e-06

Average time to execute 1 iteration (100000 iterations). Round 4:
4.212841987609863e-06
4.6886253356933596e-06

Average time to execute 1 iteration (100000 iterations). Round 5:
3.580739498138428e-06
3.840360641479492e-06

Average time to execute 1 iteration (100000 iterations). Round 6:
3.070487976074219e-06
4.423313140869141e-06

Average time to execute 1 iteration (100000 iterations). Round 7:
3.0085206031799318e-06
3.401658535003662e-06

Average time to execute 1 iteration (100000 iterations). Round 8:
2.937157154083252e-06
4.46035623550415e-06

Average time to execute 1 iteration (100000 iterations). Round 9:
3.5696911811828613e-06
3.5602593421936035e-06

Average time to execute 1 iteration (100000 iterations). Round 10:
2.7422666549682615e-06
3.158261775970459e-06

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM