简体   繁体   中英

Fastest way to remove the list/tuple contains nan

I have the same problem that Remove a tuple containing nan in list of tuples -- Python explains:

I have two list with shape 3000, say:

a = list(range(3000))
b = list(range(3000))

and some of the elements are DIFFERENT kinds of NANs, some of the elements are strings, most of them are ints and floats, say:

a[0] = np.nan
b[1] = 'hello'
a[2] = 2.0
b[3] = float('nan')

and then I need to zip them together and remove the tuple that contains nan, and I do this:

merge = zip(a, b)
c = [x for x in merge if not any(isinstance(i, float) and np.isnan(i) for i in x)]

But the performance is not so good, it takes too much time since I need to do the check a lot.

When I run it 1000 times it takes around 2.2 seconds.

Then I tried to do this:

c = [x for x in merge if all(i == i for i in x)]

When I run it 1000 times it takes around 1.1 seconds.

I was wondering if there is any faster way to remove the tuple contains NaN? Notice that there are multitype of NaNs in the tuple.

You can put the nan s in a set and check the intersection with tuples. You can do this with either a list comprehension or itertools.filterfalse :

In [17]: a = range(3000)

In [18]: merge = list(zip(a, a))

In [19]: %timeit [x for x in merge if not nans.intersection(x)]
1000 loops, best of 3: 566 us per loop

In [20]: %timeit [x for x in merge if all(i == i for i in x)]
1000 loops, best of 3: 1.13 ms per loop

In [21]: %timeit list(filterfalse(nans.intersection, merge))
1000 loops, best of 3: 402 us per loop

The last approach using filterfalse is approximately 3 times faster.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM