简体   繁体   English

删除列表/元组包含nan的最快方法

[英]Fastest way to remove the list/tuple contains nan

I have the same problem that Remove a tuple containing nan in list of tuples -- Python explains: 我有一个与删除元组列表中包含nan的元组相同的问题-Python说明:

I have two list with shape 3000, say: 我有两个形状为3000的列表,说:

a = list(range(3000))
b = list(range(3000))

and some of the elements are DIFFERENT kinds of NANs, some of the elements are strings, most of them are ints and floats, say: 某些元素是不同种类的NAN,某些元素是字符串,其中大多数是整数和浮点数,例如:

a[0] = np.nan
b[1] = 'hello'
a[2] = 2.0
b[3] = float('nan')

and then I need to zip them together and remove the tuple that contains nan, and I do this: 然后我需要将它们压缩在一起并删除包含nan的元组,然后执行以下操作:

merge = zip(a, b)
c = [x for x in merge if not any(isinstance(i, float) and np.isnan(i) for i in x)]

But the performance is not so good, it takes too much time since I need to do the check a lot. 但是性能不是很好,因为我需要做很多检查,所以花费了很多时间。

When I run it 1000 times it takes around 2.2 seconds. 当我运行1000次时,大约需要2.2秒。

Then I tried to do this: 然后我尝试这样做:

c = [x for x in merge if all(i == i for i in x)]

When I run it 1000 times it takes around 1.1 seconds. 当我运行1000次时,大约需要1.1秒。

I was wondering if there is any faster way to remove the tuple contains NaN? 我想知道是否有任何更快的方法来删除包含NaN的元组? Notice that there are multitype of NaNs in the tuple. 请注意,元组中存在多种NaN。

You can put the nan s in a set and check the intersection with tuples. 您可以将nan放入集合中,并检查与元组的交集。 You can do this with either a list comprehension or itertools.filterfalse : 您可以使用列表itertools.filterfalseitertools.filterfalse

In [17]: a = range(3000)

In [18]: merge = list(zip(a, a))

In [19]: %timeit [x for x in merge if not nans.intersection(x)]
1000 loops, best of 3: 566 us per loop

In [20]: %timeit [x for x in merge if all(i == i for i in x)]
1000 loops, best of 3: 1.13 ms per loop

In [21]: %timeit list(filterfalse(nans.intersection, merge))
1000 loops, best of 3: 402 us per loop

The last approach using filterfalse is approximately 3 times faster. 使用filterfalse的最后一种方法大约快3倍。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM