简体   繁体   English

用布尔值ndarray屏蔽ndarray以替换nans

[英]Mask ndarray with boolean ndarray to replace nans

I'm trying to use a boolean mask to address rows in a numpy array: 我正在尝试使用布尔掩码来解决numpy数组中的行:

isnan = np.isnan(self.X[:, AGE_COLUMN].astype(float))
self.X[isnan, AGE_COLUMN] = np.mean(self.X[:, AGE_COLUMN].astype(float))

isnan and X are dtype . isnan和X dtype

First I check which rows in the age column are nan. 首先,我检查age列中的哪些行是na​​n。 And then I want to set these values to the mean of all ages. 然后,我想将这些值设置为所有年龄段的平均值。 The debugger has following result for self.X[isnan, AGE_COLUMN] : 调试器对self.X[isnan, AGE_COLUMN]具有以下结果:

[nan nan nan nan nan nan nan nan nan nan ....]

If I try self.X[[True, False, True], AGE_COLUMN] for example it returns the indexed rows. 例如,如果我尝试self.X[[True, False, True], AGE_COLUMN] ,它将返回索引行。 But with the isnan array it does not work. 但是,使用isnan数组不起作用。

How can I fix this to set the nans to the mean. 我该如何解决以将nans设置为均值。

Do as follows using numpy.nanmean : it will ignore NaN s 使用numpy.nanmean执行以下操作:它将忽略 NaN

self.X[isnan, AGE_COLUMN] = np.nanmean(self.X[:, AGE_COLUMN].astype(float))

From the documentation 从文档中

numpy.nanmean(a, axis=None, dtype=None, out=None, keepdims=) numpy.nanmean(a,axis = None,dtype = None,out = None,keepdims =)

Compute the arithmetic mean along the specified axis, ignoring NaNs. 沿指定轴计算算术平均值,忽略NaN。

Returns the average of the array elements. 返回数组元素的平均值。 The average is taken over the flattened array by default, otherwise over the specified axis. 默认情况下,平均值取自展平的数组,否则取自指定的轴。 float64 intermediate and return values are used for integer inputs. float64中间值和返回值用于整数输入。

For all-NaN slices, NaN is returned and a RuntimeWarning is raised. 对于所有NaN片,将返回NaN并引发RuntimeWarning。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM