[英]Set values in numpy array to NaN by index
I want to set specific values in a numpy array to NaN
(to exclude them from a row-wise mean calculation). 我想将numpy数组中的特定值设置为NaN
(以将它们从按行均值计算中排除)。
I tried 我试过了
import numpy
x = numpy.array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [0, 0, 0, 0, 0, 0, 0, 0, 1, 0]])
cutoff = [5, 7]
for i in range(len(x)):
x[i][0:cutoff[i]:1] = numpy.nan
Looking at x
, I only see -9223372036854775808
where I expect NaN
. 看着x
,我只看到-9223372036854775808
我期望NaN
。
I thought about an alternative: 我想到了一个替代方案:
for i in range(len(x)):
for k in range(cutoff[i]):
x[i][k] = numpy.nan
Nothing happens. 什么都没发生。 What am I doing wrong? 我究竟做错了什么?
nan
is a floating-point value. nan
是一个浮点值。 When x
is an array with integer dtype, it can not be assigned a nan value. 如果x
是具有整数dtype的数组,则不能为其分配nan值。 When nan
is assigned to an array of integer dtype, the value is automatically converted to an int: 将nan
分配给整数dtype数组时,该值将自动转换为int:
In [85]: np.array(np.nan).astype(int).item()
Out[85]: -9223372036854775808
So to fix your code, make x
an array of float dtype: 因此,要修复您的代码,请将x
为float dtype数组:
x = numpy.array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [0, 0, 0, 0, 0, 0, 0, 0, 1, 0]],
dtype=float)
import numpy
x = numpy.array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [0, 0, 0, 0, 0, 0, 0, 0, 1, 0]],
dtype=float)
cutoff = [5, 7]
for i in range(len(x)):
x[i][0:cutoff[i]:1] = numpy.nan
print(x)
yields 产量
array([[ nan, nan, nan, nan, nan, 5., 6., 7., 8., 9.],
[ nan, nan, nan, nan, nan, nan, nan, 0., 1., 0.]])
Vectorized approach to set appropriate elements as NaNs 将适当元素设置为NaN的矢量化方法
@unutbu's solution must get rid of the value error you were getting. @unutbu的解决方案必须摆脱您得到的值错误。 If you are looking to vectorize
for performance, you can use boolean indexing
like so - 如果您希望向vectorize
以提高性能,则可以使用boolean indexing
如下所示:
import numpy as np
# Create mask of positions in x (with float datatype) where NaNs are to be put
mask = np.asarray(cutoff)[:,None] > np.arange(x.shape[1])
# Put NaNs into masked region of x for the desired ouput
x[mask] = np.nan
Sample run - 样品运行-
In [92]: x = np.random.randint(0,9,(4,7)).astype(float)
In [93]: x
Out[93]:
array([[ 2., 1., 5., 2., 5., 2., 1.],
[ 2., 5., 7., 1., 5., 4., 8.],
[ 1., 1., 7., 4., 8., 3., 1.],
[ 5., 8., 7., 5., 0., 2., 1.]])
In [94]: cutoff = [5,3,0,6]
In [95]: x[np.asarray(cutoff)[:,None] > np.arange(x.shape[1])] = np.nan
In [96]: x
Out[96]:
array([[ nan, nan, nan, nan, nan, 2., 1.],
[ nan, nan, nan, 1., 5., 4., 8.],
[ 1., 1., 7., 4., 8., 3., 1.],
[ nan, nan, nan, nan, nan, nan, 1.]])
Vectorized approach to directly calculate row-wise mean of appropriate elements 向量化方法可直接计算适当元素的按行平均值
If you were trying to get the masked mean values, you can modify the earlier proposed vectorized approach to avoid dealing with NaNs
altogether and more importantly keep x
with integer values. 如果要获取掩盖的均值,则可以修改较早提出的矢量化方法,以避免完全处理NaNs
,更重要的是将x
保持为整数。 Here's the modified approach - 这是修改后的方法-
# Get array version of cutoff
cutoff_arr = np.asarray(cutoff)
# Mask of positions in x which are to be considered for row-wise mean calculations
mask1 = cutoff_arr[:,None] <= np.arange(x.shape[1])
# Mask x, calculate the corresponding sum and thus mean values for each row
masked_mean_vals = (mask1*x).sum(1)/(x.shape[1] - cutoff_arr)
Here's a sample run for such a solution - 这是针对这种解决方案的示例运行-
In [61]: x = np.random.randint(0,9,(4,7))
In [62]: x
Out[62]:
array([[5, 0, 1, 2, 4, 2, 0],
[3, 2, 0, 7, 5, 0, 2],
[7, 2, 2, 3, 3, 2, 3],
[4, 1, 2, 1, 4, 6, 8]])
In [63]: cutoff = [5,3,0,6]
In [64]: cutoff_arr = np.asarray(cutoff)
In [65]: mask1 = cutoff_arr[:,None] <= np.arange(x.shape[1])
In [66]: mask1
Out[66]:
array([[False, False, False, False, False, True, True],
[False, False, False, True, True, True, True],
[ True, True, True, True, True, True, True],
[False, False, False, False, False, False, True]], dtype=bool)
In [67]: masked_mean_vals = (mask1*x).sum(1)/(x.shape[1] - cutoff_arr)
In [68]: masked_mean_vals
Out[68]: array([ 1. , 3.5 , 3.14285714, 8. ])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.