简体   繁体   English

为什么带有列表理解语句的函数比列表理解语句*快*?

[英]why is function with a list comprehension statement *faster* than the list comprehension statement?

I've come across this behaviour recently, and I am a little confused as to why it happens - my initial assumption is that sort of optimisation is going on when calling a function rather than when running a statement.我最近遇到了这种行为,我对它为什么会发生有点困惑 - 我最初的假设是调用函数而不是运行语句时正在进行这种优化。

The example: Let's start with a simple example:例子:让我们从一个简单的例子开始:

somestring="climate change is a big problem. However emissions are still rising"
sometopics=["climate","change","problem","big","rising"]

Assume we have a list of strings, similar to " somestring " above, and we also have a list of topics, like sometopics.假设我们有一个字符串列表,类似于上面的“ somestring ”,我们还有一个主题列表,比如 sometopics。

We would like to compare whether any of the "sometopics" exist in "somestring" and importantly return those that do to a new list.我们想比较“somestring”中是否存在任何“sometopics”,重要的是将那些“sometopics”返回到新列表中。

with a list comprehension statement we can do it like this for one string:使用列表理解语句,我们可以对一个字符串执行以下操作:

result = [element for element in sometopic if(element in somestring)]

on my machine however, a function definition as below, runs about 20-30% faster than the statement above.然而,在我的机器上,如下函数定义的运行速度比上面的语句20-30%。

def comparelistoftopicstokw(mystring,somelistoftopics):
   result = [element for element in somelistoftopics if(element in mystring)]
   return result

Why does this happen?为什么会发生这种情况?

is it always the case that a function will be faster than an equivalent statement / list of statements?函数总是比等效的语句/语句列表更快吗?

EDIT****编辑****

See below Minimum viable reproducable notebook example:请参阅下面的最小可行的可复制笔记本示例:

import pandas as pd, numpy as np

columns_df = pd.DataFrame({"Keyword":['fish soup','katsu','soup']}) # Compute a Pandas dataframe to write into 500kcolumns
somestring="pad thai is a good recipe. It is cooked with chicken or lamb or beef"
sometopics=["chicken","pad thai","recipe","lamb","beef"]
print(len(sometopics))
somebigtopics=sometopics*100000


def extractsubstrings(inputstring,alistofpossibletopics):
    #obvious very slow for loop
    topicslist=[]
    print(inputstring)
    for topic in alistofpossibletopics:
        if str(topic) in inputstring:
            topicslist.append(str(topic))

%%time
def listcompinlists(mystring,bigtopic):
    res = [ele for ele in bigtopic if(ele in mystring)] 
    return res

%%time
res = [ele for ele in somebigtopics if(ele in somestring)] 

%%time
x=extractsubstrings(somestring,somebigtopics)

%%time
funcres=listcompinlists(somestring,somebigtopics)

On my machine (ubuntu 18.04, Python 3.6), the list comprehension is executed for the above case in 22-24ms, while the function executes in 18-21 ms.在我的机器(ubuntu 18.04,Python 3.6)上,上述情况的列表推导在 22-24 毫秒内执行,而函数在 18-21 毫秒内执行。 its not a huge difference, but if you have 10 million rows to process for example thats a fair few hours saving这不是一个巨大的差异,但是如果你有 1000 万行要处理,例如那可以节省几个小时

TLDR Performance comparison:

extractsubstrings=Wall time: 122 ms
list comprehension statement: Wall time: 24.5 ms
listcompinlists=Wall time: 18.6 ms

I cannot reproduce what you are claiming.我无法重现您所声称的内容。 Can you provide any measurements that prove your assertion?你能提供任何测量来证明你的断言吗?

I created this measurment to compare execution times:我创建了这个度量来比较执行时间:

import time

N = 1000000

def comparelistoftopicstokw(mystring,somelistoftopics):
   result = [element for element in somelistoftopics if(element in mystring)]
   return result
   
somestring="climate change is a big problem. However emissions are still rising"
sometopics=["climate","change","problem","big","rising"]

start = time.time()
for _ in range(N):
    result = [element for element in sometopics if(element in somestring)]
end = time.time()
print(f'Time using list comprehension: {end - start}')
   
start = time.time()
for _ in range(N):
    result = comparelistoftopicstokw(somestring, sometopics)
end = time.time()
print(f'Time using function: {end - start}')

Output输出

Time using list comprehension: 0.9571423530578613
Time using function: 1.1152479648590088

So in my case the list comprehension is faster on average.所以在我的情况下,列表理解平均速度更快。

I'm not able to give an answer to your question, but I have made a small test that questions its foundations.我无法回答你的问题,但我做了一个小测试,质疑它的基础。

As we can infer from the output, the result is quite random and there are situations where one is on average faster than the other, situations where it is the opposite正如我们可以从输出中推断出的那样,结果是非常随机的,在某些情况下,一个平均比另一个快,情况相反

import time
import statistics

somestring="climate change is a big problem. However emissions are still rising"
sometopics=["climate","change","problem","big","rising"]


def comparelistoftopicstokw(mystring,somelistoftopics):
   result = [element for element in somelistoftopics if(element in mystring)]
   return result

for i in range(10):
    print(f"Average time to execute 1 iteration (100000 iterations). Round {i + 1}:")
    time1average = []
    for i in range(100000):
        start1 = time.time()
        result = [element for element in sometopics if(element in somestring)]
        time1average.append(time.time() - start1)
        
    print(statistics.mean(time1average))
    
    time2average = []
    for i in range(100000):
        start2 = time.time()
        comparelistoftopicstokw(somestring,sometopics)
        time2average.append(time.time() - start2)
    
    print(statistics.mean(time2average))
    print("")

Output:输出:

Average time to execute 1 iteration (100000 iterations). Round 1:
3.879823684692383e-06
5.041525363922119e-06

Average time to execute 1 iteration (100000 iterations). Round 2:
4.478754997253418e-06
5.097501277923584e-06

Average time to execute 1 iteration (100000 iterations). Round 3:
3.9185094833374025e-06
4.177823066711426e-06

Average time to execute 1 iteration (100000 iterations). Round 4:
4.212841987609863e-06
4.6886253356933596e-06

Average time to execute 1 iteration (100000 iterations). Round 5:
3.580739498138428e-06
3.840360641479492e-06

Average time to execute 1 iteration (100000 iterations). Round 6:
3.070487976074219e-06
4.423313140869141e-06

Average time to execute 1 iteration (100000 iterations). Round 7:
3.0085206031799318e-06
3.401658535003662e-06

Average time to execute 1 iteration (100000 iterations). Round 8:
2.937157154083252e-06
4.46035623550415e-06

Average time to execute 1 iteration (100000 iterations). Round 9:
3.5696911811828613e-06
3.5602593421936035e-06

Average time to execute 1 iteration (100000 iterations). Round 10:
2.7422666549682615e-06
3.158261775970459e-06

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM