I would like to know if there is a clean way to handle nan in numpy.
my_array1=np.array([5,4,2,2,4,np.nan,np.nan,6])
print my_array1
#[ 5. 4. 2. 2. 4. nan nan 6.]
print set(my_array1)
#set([nan, nan, 2.0, 4.0, 5.0, 6.0])
I would have thought it should return at most 1 nan value. Why does it return multiple nan values? I would like to know how many unique non nan values I have in a numpy array.
Thanks
You can use np.unique
to find unique values in combination with isnan
to filter the NaN
values:
In [22]:
my_array1=np.array([5,4,2,2,4,np.nan,np.nan,6])
np.unique(my_array1[~np.isnan(my_array1)])
Out[22]:
array([ 2., 4., 5., 6.])
as to why you get multiple NaN
values it's because NaN
values cannot be compared normally:
In [23]:
np.nan == np.nan
Out[23]:
False
so you have to use isnan
to perform the correct comparison
using set
:
In [24]:
set(my_array1[~np.isnan(my_array1)])
Out[24]:
{2.0, 4.0, 5.0, 6.0}
You can call len
on any of the above to get a size:
In [26]:
len(np.unique(my_array1[~np.isnan(my_array1)]))
Out[26]:
4
I'd suggest using pandas. I think it's a direct replacement, but pandas keeps the original order unlike numpy.
import numpy as np
import pandas as pd
my_array1=np.array([5,4,2,2,4,np.nan,np.nan,6])
np.unique(my_array1)
# array([ 2., 4., 5., 6., nan, nan])
pd.unique(my_array1)
# array([ 5., 4., 2., nan, 6.])
I'm using numpy 1.17.4 and pandas 0.25.3. Hope this helps!
As previous answers have already stated, numpy can't count nans directly, because it can't compare nans. numpy.ma.count_masked
is your friend. For example, like this:
>>> import numpy.ma as ma
>>> a = np.array([ 0., 1., np.nan, np.nan, 4.])
>>> a
np.array([ 0., 1., nan, nan, 4.])
>>> a_masked = ma.masked_invalid(a)
>>> a_masked
masked_array(data=[0.0, 1.0, --, --, 4.0],
mask=[False, False, True, True, False],
fill_value=1e+20)
>>> ma.count_masked(a_masked)
2
You could use isnan() with your setm then iterate through result of isnan() array and remove all NaN objects.
my_array1=np.array([5,4,2,2,4,np.nan,np.nan,6])
print my_array1
#[ 5. 4. 2. 2. 4. nan nan 6.]
print set(my_array1)
#set([nan, nan, 2.0, 4.0, 5.0, 6.0])
for i,is_nan in enumerate(np.isnan(list(my_array1))):
if is_nan:
del my_array1[i]
As of Numpy version 1.21.0, np.unique now returns single NaN :
>>> a = np.array([8, 1, np.nan, 3, np.inf, np.nan, -np.inf, -2, np.nan, 3])
>>> np.unique(a)
array([-inf, -2., 1., 3., 8., inf, nan])
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.