简体   繁体   中英

why is function with a list comprehension statement *faster* than the list comprehension statement?

I've come across this behaviour recently, and I am a little confused as to why it happens - my initial assumption is that sort of optimisation is going on when calling a function rather than when running a statement.

The example: Let's start with a simple example:

somestring="climate change is a big problem. However emissions are still rising"
sometopics=["climate","change","problem","big","rising"]

Assume we have a list of strings, similar to " somestring " above, and we also have a list of topics, like sometopics.

We would like to compare whether any of the "sometopics" exist in "somestring" and importantly return those that do to a new list.

with a list comprehension statement we can do it like this for one string:

result = [element for element in sometopic if(element in somestring)]

on my machine however, a function definition as below, runs about 20-30% faster than the statement above.

def comparelistoftopicstokw(mystring,somelistoftopics):
   result = [element for element in somelistoftopics if(element in mystring)]
   return result

Why does this happen?

is it always the case that a function will be faster than an equivalent statement / list of statements?

EDIT****

See below Minimum viable reproducable notebook example:

import pandas as pd, numpy as np

columns_df = pd.DataFrame({"Keyword":['fish soup','katsu','soup']}) # Compute a Pandas dataframe to write into 500kcolumns
somestring="pad thai is a good recipe. It is cooked with chicken or lamb or beef"
sometopics=["chicken","pad thai","recipe","lamb","beef"]
print(len(sometopics))
somebigtopics=sometopics*100000


def extractsubstrings(inputstring,alistofpossibletopics):
    #obvious very slow for loop
    topicslist=[]
    print(inputstring)
    for topic in alistofpossibletopics:
        if str(topic) in inputstring:
            topicslist.append(str(topic))

%%time
def listcompinlists(mystring,bigtopic):
    res = [ele for ele in bigtopic if(ele in mystring)] 
    return res

%%time
res = [ele for ele in somebigtopics if(ele in somestring)] 

%%time
x=extractsubstrings(somestring,somebigtopics)

%%time
funcres=listcompinlists(somestring,somebigtopics)

On my machine (ubuntu 18.04, Python 3.6), the list comprehension is executed for the above case in 22-24ms, while the function executes in 18-21 ms. its not a huge difference, but if you have 10 million rows to process for example thats a fair few hours saving

TLDR Performance comparison:

extractsubstrings=Wall time: 122 ms
list comprehension statement: Wall time: 24.5 ms
listcompinlists=Wall time: 18.6 ms

I cannot reproduce what you are claiming. Can you provide any measurements that prove your assertion?

I created this measurment to compare execution times:

import time

N = 1000000

def comparelistoftopicstokw(mystring,somelistoftopics):
   result = [element for element in somelistoftopics if(element in mystring)]
   return result
   
somestring="climate change is a big problem. However emissions are still rising"
sometopics=["climate","change","problem","big","rising"]

start = time.time()
for _ in range(N):
    result = [element for element in sometopics if(element in somestring)]
end = time.time()
print(f'Time using list comprehension: {end - start}')
   
start = time.time()
for _ in range(N):
    result = comparelistoftopicstokw(somestring, sometopics)
end = time.time()
print(f'Time using function: {end - start}')

Output

Time using list comprehension: 0.9571423530578613
Time using function: 1.1152479648590088

So in my case the list comprehension is faster on average.

I'm not able to give an answer to your question, but I have made a small test that questions its foundations.

As we can infer from the output, the result is quite random and there are situations where one is on average faster than the other, situations where it is the opposite

import time
import statistics

somestring="climate change is a big problem. However emissions are still rising"
sometopics=["climate","change","problem","big","rising"]


def comparelistoftopicstokw(mystring,somelistoftopics):
   result = [element for element in somelistoftopics if(element in mystring)]
   return result

for i in range(10):
    print(f"Average time to execute 1 iteration (100000 iterations). Round {i + 1}:")
    time1average = []
    for i in range(100000):
        start1 = time.time()
        result = [element for element in sometopics if(element in somestring)]
        time1average.append(time.time() - start1)
        
    print(statistics.mean(time1average))
    
    time2average = []
    for i in range(100000):
        start2 = time.time()
        comparelistoftopicstokw(somestring,sometopics)
        time2average.append(time.time() - start2)
    
    print(statistics.mean(time2average))
    print("")

Output:

Average time to execute 1 iteration (100000 iterations). Round 1:
3.879823684692383e-06
5.041525363922119e-06

Average time to execute 1 iteration (100000 iterations). Round 2:
4.478754997253418e-06
5.097501277923584e-06

Average time to execute 1 iteration (100000 iterations). Round 3:
3.9185094833374025e-06
4.177823066711426e-06

Average time to execute 1 iteration (100000 iterations). Round 4:
4.212841987609863e-06
4.6886253356933596e-06

Average time to execute 1 iteration (100000 iterations). Round 5:
3.580739498138428e-06
3.840360641479492e-06

Average time to execute 1 iteration (100000 iterations). Round 6:
3.070487976074219e-06
4.423313140869141e-06

Average time to execute 1 iteration (100000 iterations). Round 7:
3.0085206031799318e-06
3.401658535003662e-06

Average time to execute 1 iteration (100000 iterations). Round 8:
2.937157154083252e-06
4.46035623550415e-06

Average time to execute 1 iteration (100000 iterations). Round 9:
3.5696911811828613e-06
3.5602593421936035e-06

Average time to execute 1 iteration (100000 iterations). Round 10:
2.7422666549682615e-06
3.158261775970459e-06

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM