简体   繁体   English

如何从 NumPy 数组中删除 NaN 值?

[英]How do I remove NaN values from a NumPy array?

How do I remove NaN values from a NumPy array?如何从 NumPy 数组中删除 NaN 值?

[1, 2, NaN, 4, NaN, 8]   ⟶   [1, 2, 4, 8]

If you're using numpy for your arrays, you can also use如果您对数组使用 numpy,您也可以使用

x = x[numpy.logical_not(numpy.isnan(x))]

Equivalently等效地

x = x[~numpy.isnan(x)]

[Thanks to chbrown for the added shorthand] [感谢 chbrown 添加的速记]

Explanation解释

The inner function, numpy.isnan returns a boolean/logical array which has the value True everywhere that x is not-a-number.内部函数numpy.isnan返回一个布尔/逻辑数组,该数组在x不是数字的任何地方都具有值True As we want the opposite, we use the logical-not operator, ~ to get an array with True s everywhere that x is a valid number.正如我们想要相反的那样,我们使用逻辑非运算符~来获得一个数组,其中在x有效数字的任何地方都有True s。

Lastly we use this logical array to index into the original array x , to retrieve just the non-NaN values.最后,我们使用这个逻辑数组来索引原始数组x ,只检索非 NaN 值。

filter(lambda v: v==v, x)

适用于列表和 numpy 数组,因为 v!=v 仅适用于 NaN

Try this:尝试这个:

import math
print [value for value in x if not math.isnan(value)]

For more, read on List Comprehensions .有关更多信息,请阅读列表理解

对我来说,@jmetz 的答案没有用,但是使用 pandas isnull() 可以。

x = x[~pd.isnull(x)]

@jmetz's answer is probably the one most people need; @jmetz 的答案可能是大多数人需要的; however it yields a one-dimensional array, eg making it unusable to remove entire rows or columns in matrices.但是它会产生一个一维数组,例如使其无法删除矩阵中的整行或整列。

To do so, one should reduce the logical array to one dimension, then index the target array.为此,应该将逻辑数组减少到一维,然后索引目标数组。 For instance, the following will remove rows which have at least one NaN value:例如,以下将删除至少具有一个 NaN 值的行:

x = x[~numpy.isnan(x).any(axis=1)]

See more detail here . 在此处查看更多详细信息。

As shown by others正如其他人所展示的

x[~numpy.isnan(x)]

works.作品。 But it will throw an error if the numpy dtype is not a native data type, for example if it is object.但如果 numpy dtype 不是本机数据类型,例如它是对象,它将引发错误。 In that case you can use pandas.在这种情况下,您可以使用熊猫。

x[~pandas.isna(x)] or x[~pandas.isnull(x)]

Doing the above :执行上述操作:

x = x[~numpy.isnan(x)]

or或者

x = x[numpy.logical_not(numpy.isnan(x))]

I found that resetting to the same variable (x) did not remove the actual nan values and had to use a different variable.我发现重置为相同的变量 (x) 并没有删除实际的 nan 值,并且必须使用不同的变量。 Setting it to a different variable removed the nans.将其设置为不同的变量会删除 nans。 eg例如

y = x[~numpy.isnan(x)]

If you're using numpy如果您使用的是numpy

# first get the indices where the values are finite
ii = np.isfinite(x)

# second get the values
x = x[ii]

The accepted answer changes shape for 2d arrays.接受的答案改变了二维数组的形状。 I present a solution here, using the Pandas dropna() functionality.我在这里提出了一个解决方案,使用 Pandas dropna()功能。 It works for 1D and 2D arrays.它适用于一维和二维数组。 In the 2D case you can choose weather to drop the row or column containing np.nan .在 2D 情况下,您可以选择天气来删除包含np.nan的行或列

import pandas as pd
import numpy as np

def dropna(arr, *args, **kwarg):
    assert isinstance(arr, np.ndarray)
    dropped=pd.DataFrame(arr).dropna(*args, **kwarg).values
    if arr.ndim==1:
        dropped=dropped.flatten()
    return dropped

x = np.array([1400, 1500, 1600, np.nan, np.nan, np.nan ,1700])
y = np.array([[1400, 1500, 1600], [np.nan, 0, np.nan] ,[1700,1800,np.nan]] )


print('='*20+' 1D Case: ' +'='*20+'\nInput:\n',x,sep='')
print('\ndropna:\n',dropna(x),sep='')

print('\n\n'+'='*20+' 2D Case: ' +'='*20+'\nInput:\n',y,sep='')
print('\ndropna (rows):\n',dropna(y),sep='')
print('\ndropna (columns):\n',dropna(y,axis=1),sep='')

print('\n\n'+'='*20+' x[np.logical_not(np.isnan(x))] for 2D: ' +'='*20+'\nInput:\n',y,sep='')
print('\ndropna:\n',x[np.logical_not(np.isnan(x))],sep='')

Result:结果:

==================== 1D Case: ====================
Input:
[1400. 1500. 1600.   nan   nan   nan 1700.]

dropna:
[1400. 1500. 1600. 1700.]


==================== 2D Case: ====================
Input:
[[1400. 1500. 1600.]
 [  nan    0.   nan]
 [1700. 1800.   nan]]

dropna (rows):
[[1400. 1500. 1600.]]

dropna (columns):
[[1500.]
 [   0.]
 [1800.]]


==================== x[np.logical_not(np.isnan(x))] for 2D: ====================
Input:
[[1400. 1500. 1600.]
 [  nan    0.   nan]
 [1700. 1800.   nan]]

dropna:
[1400. 1500. 1600. 1700.]

In case it helps, for simple 1d arrays:如果有帮助,对于简单的一维数组:

x = np.array([np.nan, 1, 2, 3, 4])

x[~np.isnan(x)]
>>> array([1., 2., 3., 4.])

but if you wish to expand to matrices and preserve the shape:但如果您希望扩展为矩阵并保留形状:

x = np.array([
    [np.nan, np.nan],
    [np.nan, 0],
    [1, 2],
    [3, 4]
])

x[~np.isnan(x).any(axis=1)]
>>> array([[1., 2.],
           [3., 4.]])

I encountered this issue when dealing with pandas .shift() functionality, and I wanted to avoid using .apply(..., axis=1) at all cost due to its inefficiency.我在处理 pandas .shift()功能时遇到了这个问题,由于效率低下,我想不惜一切代价避免使用.apply(..., axis=1)

Simply fill with只需填写

 x = numpy.array([
 [0.99929941, 0.84724713, -0.1500044],
 [-0.79709026, numpy.NaN, -0.4406645],
 [-0.3599013, -0.63565744, -0.70251352]])

x[numpy.isnan(x)] = .555

print(x)

# [[ 0.99929941  0.84724713 -0.1500044 ]
#  [-0.79709026  0.555      -0.4406645 ]
#  [-0.3599013  -0.63565744 -0.70251352]]

I want to figure out how to remove nan values from my array.我想弄清楚如何从我的数组中删除 nan 值。 My array looks something like this:我的数组看起来像这样:

x = [1400, 1500, 1600, nan, nan, nan ,1700] #Not in this exact configuration

How can I remove the nan values from x ?如何从x删除nan值?

pandas introduces an option to convert all data types to missing values. pandas 引入了一个将所有数据类型转换为缺失值的选项。

The np.isnan() function is not compatible with all data types, eg np.isnan()函数与所有数据类型不兼容,例如

>>> import numpy as np
>>> values = [np.nan, "x", "y"]
>>> np.isnan(values)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

The pd.isna() and pd.notna() functions are compatible with many data types and pandas introduces a pd.NA value: pd.isna()pd.notna()函数与许多数据类型兼容,pandas 引入了pd.NA值:

>>> import numpy as np
>>> import pandas as pd

>>> values = pd.Series([np.nan, "x", "y"])
>>> values
0    NaN
1      x
2      y
dtype: object
>>> values.loc[pd.isna(values)]
0    NaN
dtype: object
>>> values.loc[pd.isna(values)] = pd.NA
>>> values.loc[pd.isna(values)]
0    <NA>
dtype: object
>>> values
0    <NA>
1       x
2       y
dtype: object

#
# using map with lambda, or a list comprehension
#

>>> values = [np.nan, "x", "y"]
>>> list(map(lambda x: pd.NA if pd.isna(x) else x, values))
[<NA>, 'x', 'y']
>>> [pd.NA if pd.isna(x) else x for x in values]
[<NA>, 'x', 'y']

I love list comprehension, you can try it out. 我喜欢列表理解,您可以尝试一下。

a array([65.36512 , 39.98848 , 28.25152 , 37.39968 , 59.32288 , 40.85184 , 71.98208 , 41.7152 , 33.71776 , 38.5504 , 21.34656 , 37.97504 , 57.5968 , 30.494656, 80.03776 , 33.94688 , 37.45792 , 27.617664, 15.59296 , 27.329984, 45.2256 , 61.27872 , 57.8848 , 87.4592 , 34.29312 , 85.15776 , 46.37696 , 79.11616 , nan, nan]) 数组([65.36512,39.98848,28.25152,37.39968,59.32288,40.85184,71.98208,41.7152,33.71776,38.5504,21.34656,37.97504,57.5968,30.494656,80.03776,33.94688,37.45792,27.617664,15.59296,27.872 87.4592,34.29312,85.15776,46.37696,79.11616,nan,nan])

np.array([i for i in a if np.isnan(i) == False ]) np.array([如果np.isnan(i)== False,则为i表示i)

array([65.36512 , 39.98848 , 28.25152 , 37.39968 , 59.32288 , 40.85184 , 71.98208 , 41.7152 , 33.71776 , 38.5504 , 21.34656 , 37.97504 , 57.5968 , 30.494656, 80.03776 , 33.94688 , 37.45792 , 27.617664, 15.59296 , 27.329984, 45.2256 , 61.27872 , 57.8848 , 87.4592 , 34.29312 , 85.15776 , 46.37696 , 79.11616 ]) array([65.36512、39.98848、28.25152、37.39968、59.32288、40.85184、71.98208、41.7152、33.71776、38.5504、21.34656、37.97504、57.5968、30.494656、80.03776、33.94688、37.45792、27.617664、15.59296、27.3297.8 ,34.29312,85.15776,46.37696,79.11616])

A simplest way is:一个最简单的方法是:

numpy.nan_to_num(x)

Documentation: https://docs.scipy.org/doc/numpy/reference/generated/numpy.nan_to_num.html文档: https ://docs.scipy.org/doc/numpy/reference/generated/numpy.nan_to_num.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM