简体   繁体   中英

Removing items from a list based on a function

What is the most efficient way to remove items from a list based on a function in python (and any common library)?

For example, if I have the following function:

def func(a):
    return a % 2 == 1

And the following list:

arr = [1,4,5,8,20,24]

Then I would want the result:

new_arr = [1,5]

I know I could simply iterate over the list like such:

new_arr = [i for i in arr if func(a)]

Just wondering if this is an efficient approach (for large datasets), or if there might be a better way. I was thinking maybe using np to map and changing the function to return a if True and -1 if false, and then using np remove to remove all 0s?

Edit: I decided to test it myself with the suggestions you all gave (I probably just should have tested runtime myself rather than being lazy and asking other people).

filter was by far the fastest alone. Though if you needed a list for random access it was comparable to the [i for i in arr if func(i)] method.

The numpy [np.vectorize(func)(arr)] was slightly faster than the other list methods.

This is the use case for the builtin filter function:

filtered_arr = filter(func, arr)

Note that this returns an iterator, not a list. If you want a list, you can create one with list(filtered_arr) or a list comprehension as you noted. But if you just want to iterate over the filtered items and don't need random access / indexing, it's more memory efficient to use the iterator.

This is a good general approach for filtering lists that are not especially large and contain elements with arbitrary (and possibly mixed) types. If you are working with a large amount of numerical data, you should use one of the NumPy solutions mentioned in other answers.

Since the numpy tag is present, let use it. In the example we use a mask for elements that give remainder 1 when divided by 2.

>>> import numpy as np
>>>
>>> arr = np.array([1,4,5,8,20,24])
>>> arr
array([ 1,  4,  5,  8, 20, 24])
>>> arr[arr % 2 == 1]
array([1, 5])

Using numpy you could use numpy.vectorize to map your function across your array elements, then use that to keep the elements that evaluate to True . So starting with your function

def func(a):
    return a % 2 == 1

We could test keeping only the odd values from the range of [0, 19]

>>> import numpy as np
>>> arr = np.arange(20)
>>> arr
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19])
>>> np.vectorize(func)(arr)
array([False,  True, False,  True, False,  True, False,  True, False, True, False,  True, False,  True, False,  True, False,  True, False,  True])
>>> arr[np.vectorize(func)(arr)]
array([ 1,  3,  5,  7,  9, 11, 13, 15, 17, 19])

You could also rewrite func to make it take a list as an argument:

def func(lst):
    lst = [x for x in lst if x % 2 == 1]
    return lst

and then do:

new_arr = func(arr)

This would save you some lines in comparison to making func take a single number as argument and writing an iteration such as [for i in arr if func(a)] every time you want to use it on (elements of) a list.

I decided to test it myself with the suggestions you all gave (I probably just should have tested runtime myself rather than being lazy and asking other people).

filter was by far the fastest alone. Though if you needed a list for random access it was comparable to the [i for i in arr if func(i)] method.

The numpy [np.vectorize(func)(arr)] was slightly faster than the other methods.

Just wanted to post this in case anyone else runs across this in the future.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM