简体   繁体   English

通过索引将numpy数组中的值设置为NaN

[英]Set values in numpy array to NaN by index

I want to set specific values in a numpy array to NaN (to exclude them from a row-wise mean calculation). 我想将numpy数组中的特定值设置为NaN (以将它们从按行均值计算中排除)。

I tried 我试过了

import numpy

x = numpy.array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [0, 0, 0, 0, 0, 0, 0, 0, 1, 0]])
cutoff = [5, 7]
for i in range(len(x)):
    x[i][0:cutoff[i]:1] = numpy.nan

Looking at x , I only see -9223372036854775808 where I expect NaN . 看着x ,我只看到-9223372036854775808我期望NaN

I thought about an alternative: 我想到了一个替代方案:

for i in range(len(x)):
    for k in range(cutoff[i]):
        x[i][k] = numpy.nan

Nothing happens. 什么都没发生。 What am I doing wrong? 我究竟做错了什么?

nan is a floating-point value. nan是一个浮点值。 When x is an array with integer dtype, it can not be assigned a nan value. 如果x是具有整数dtype的数组,则不能为其分配nan值。 When nan is assigned to an array of integer dtype, the value is automatically converted to an int: nan分配给整数dtype数组时,该值将自动转换为int:

In [85]: np.array(np.nan).astype(int).item()
Out[85]: -9223372036854775808

So to fix your code, make x an array of float dtype: 因此,要修复您的代码,请将x为float dtype数组:

x = numpy.array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [0, 0, 0, 0, 0, 0, 0, 0, 1, 0]], 
                dtype=float)

import numpy

x = numpy.array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [0, 0, 0, 0, 0, 0, 0, 0, 1, 0]], 
                dtype=float)
cutoff = [5, 7]
for i in range(len(x)):
    x[i][0:cutoff[i]:1] = numpy.nan
 print(x)

yields 产量

array([[ nan,  nan,  nan,  nan,  nan,   5.,   6.,   7.,   8.,   9.],
       [ nan,  nan,  nan,  nan,  nan,  nan,  nan,   0.,   1.,   0.]])

Vectorized approach to set appropriate elements as NaNs 将适当元素设置为NaN的矢量化方法

@unutbu's solution must get rid of the value error you were getting. @unutbu的解决方案必须摆脱您得到的值错误。 If you are looking to vectorize for performance, you can use boolean indexing like so - 如果您希望向vectorize以提高性能,则可以使用boolean indexing如下所示:

import numpy as np

# Create mask of positions in x (with float datatype) where NaNs are to be put
mask = np.asarray(cutoff)[:,None] > np.arange(x.shape[1])

# Put NaNs into masked region of x for the desired ouput
x[mask] = np.nan

Sample run - 样品运行-

In [92]: x = np.random.randint(0,9,(4,7)).astype(float)

In [93]: x
Out[93]: 
array([[ 2.,  1.,  5.,  2.,  5.,  2.,  1.],
       [ 2.,  5.,  7.,  1.,  5.,  4.,  8.],
       [ 1.,  1.,  7.,  4.,  8.,  3.,  1.],
       [ 5.,  8.,  7.,  5.,  0.,  2.,  1.]])

In [94]: cutoff = [5,3,0,6]

In [95]: x[np.asarray(cutoff)[:,None] > np.arange(x.shape[1])] = np.nan

In [96]: x
Out[96]: 
array([[ nan,  nan,  nan,  nan,  nan,   2.,   1.],
       [ nan,  nan,  nan,   1.,   5.,   4.,   8.],
       [  1.,   1.,   7.,   4.,   8.,   3.,   1.],
       [ nan,  nan,  nan,  nan,  nan,  nan,   1.]])

Vectorized approach to directly calculate row-wise mean of appropriate elements 向量化方法可直接计算适当元素的按行平均值

If you were trying to get the masked mean values, you can modify the earlier proposed vectorized approach to avoid dealing with NaNs altogether and more importantly keep x with integer values. 如果要获取掩盖的均值,则可以修改较早提出的矢量化方法,以避免完全处理NaNs ,更重要的是将x保持为整数。 Here's the modified approach - 这是修改后的方法-

# Get array version of cutoff
cutoff_arr = np.asarray(cutoff)

# Mask of positions in x which are to be considered for row-wise mean calculations
mask1 = cutoff_arr[:,None] <= np.arange(x.shape[1])

# Mask x, calculate the corresponding sum and thus mean values for each row
masked_mean_vals = (mask1*x).sum(1)/(x.shape[1] -  cutoff_arr)

Here's a sample run for such a solution - 这是针对这种解决方案的示例运行-

In [61]: x = np.random.randint(0,9,(4,7))

In [62]: x
Out[62]: 
array([[5, 0, 1, 2, 4, 2, 0],
       [3, 2, 0, 7, 5, 0, 2],
       [7, 2, 2, 3, 3, 2, 3],
       [4, 1, 2, 1, 4, 6, 8]])

In [63]: cutoff = [5,3,0,6]

In [64]: cutoff_arr = np.asarray(cutoff)

In [65]: mask1 = cutoff_arr[:,None] <= np.arange(x.shape[1])

In [66]: mask1
Out[66]: 
array([[False, False, False, False, False,  True,  True],
       [False, False, False,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True,  True,  True],
       [False, False, False, False, False, False,  True]], dtype=bool)

In [67]: masked_mean_vals = (mask1*x).sum(1)/(x.shape[1] -  cutoff_arr)

In [68]: masked_mean_vals
Out[68]: array([ 1.        ,  3.5       ,  3.14285714,  8.        ])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM