简体   繁体   中英

Python - list comprehension in this case is efficient?

The is the input "dirty" list in python

input_list = ['  \n  ','  data1\n ','   data2\n','  \n','data3\n'.....]

each list element contains either empty spaces with new line chars or data with newline chars

Cleaned it up using the below code..

cleaned_up_list = [data.strip() for data in input_list if data.strip()]

gives

  cleaned_up_list =   ['data1','data2','data3','data4'..]

Does python internally call strip() twice during the above list comprehension? or would i have to use a for loop iteration and strip() just once if i cared about efficiency?

for data in input_list
  clean_data = data.strip()
     if(clean_data):
         cleaned_up_list.append(clean_data)

Using your list comp strip is called twice, use a gen exp if you want to only call strip once and keep the comprehension:

input_list[:] = [x for x in (s.strip() for s in input_list) if x]

Input:

input_list = ['  \n  ','  data1\n ','   data2\n','  \n','data3\n']

Output:

 ['data1', 'data2', 'data3']

input_list[:] will change the original list which may or may not be what you want, if you actually want to create a new list just use cleaned_up_list = ... .

I always found using itertools.imap in python 2 and map in python 3 instead of the generator to be the most efficient for larger inputs:

from itertools import imap
input_list[:] = [x for x in imap(str.strip, input_list) if x]

Some timings with different approaches:

In [17]: input_list = [choice(input_list) for _ in range(1000000)]   

In [19]: timeit filter(None, imap(str.strip, input_list))
10 loops, best of 3: 115 ms per loop

In [20]: timeit list(ifilter(None,imap(str.strip,input_list)))
10 loops, best of 3: 110 ms per loop

In [21]: timeit [x for x in imap(str.strip,input_list) if x]
10 loops, best of 3: 125 ms per loop

In [22]: timeit [x for x in (s.strip() for s in input_list) if x]  
10 loops, best of 3: 145 ms per loop

In [23]: timeit [data.strip() for data in input_list if data.strip()]
10 loops, best of 3: 160 ms per loop

In [24]: %%timeit                                                
   ....:     cleaned_up_list = []
   ....:     for data in input_list:
   ....:          clean_data = data.strip()
   ....:          if clean_data:
   ....:              cleaned_up_list.append(clean_data)
   ....: 

10 loops, best of 3: 150 ms per loop

In [25]: 

In [25]: %%timeit                                                    
   ....:     cleaned_up_list = []
   ....:     append = cleaned_up_list.append
   ....:     for data in input_list:
   ....:          clean_data = data.strip()
   ....:          if clean_data:
   ....:              append(clean_data)
   ....: 

10 loops, best of 3: 123 ms per loop

The fastest approach is actually itertools.ifilter combined with itertools.imap closely followed by filter with imap .

Removing the need to reevaluate the function reference list.append each iteration is more efficient, if you were stuck with a loop and wanted the most efficient approach then it is a viable alternative.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM