我如何用numpy的行平均值替换丢失/屏蔽的数据

Question

How would I replace the missing values in the 'b' array below with the corresponding row averages in 'c'? 如何将下面的“ b”数组中的缺失值替换为“ c”中的相应行平均值？

a=numpy.arange(24).reshape(4,-1)
b=numpy.ma.masked_where(numpy.remainder(a,5)==0,a);b
Out[46]: 
 masked_array(data =
 [[-- 1 2 3 4 --]
 [6 7 8 9 -- 11]
 [12 13 14 -- 16 17]
 [18 19 -- 21 22 23]],
         mask =
 [[ True False False False False  True]
 [False False False False  True False]
 [False False False  True False False]
 [False False  True False False False]],
       fill_value = 999999)

c=b.mean(axis=1);c
Out[47]: 
masked_array(data = [2.5 8.2 14.4 20.6],
         mask = [False False False False],
   fill_value = 1e+20)

Answer 1

You can use where and take : 您可以where take ：

inds = np.where(b.mask)

b[inds] = np.take(c,inds[0])

b
masked_array(data =
 [[2 1 2 3 4 2]
 [6 7 8 9 8 11]
 [12 13 14 14 16 17]
 [18 19 20 21 22 23]],
             mask =
 [[False False False False False False]
 [False False False False False False]
 [False False False False False False]
 [False False False False False False]],
       fill_value = 999999)

In this particular example you have issues with the dtype of a . 在这个特殊的例子，你有问题dtype的a 。 If you add a = a.astype(np.float) before the creation of b it works just fine. 如果在创建b之前添加a = a.astype(np.float) ，则效果很好。 There may be a faster way to create the indices then np.where . 创建索引的方法可能比np.where 。

Answer 2

Try this: 尝试这个：

np.copyto(b, c[...,None], where=b.mask)

You have to add the extra axis to c so that it knows to apply it to each row. 您必须将多余的轴添加到c以便它知道将其应用于每行。 (if np.mean had a keepdims option like np.sum , this wouldn't be necessary :P （如果np.mean具有类似np.sum的keepdims选项，则不必：P

import numpy as np

a = np.arange(24).reshape(4,-1).astype(float)   # I changed your example to be a float
b = np.ma.masked_where(numpy.remainder(a,5)==0,a)
c = b.mean(1)

np.copyto(b, c[...,None], where=b.mask)

In [189]: b.data
Out[189]: 
array([[  2.5,   1. ,   2. ,   3. ,   4. ,   2.5],
       [  6. ,   7. ,   8. ,   9. ,   8.2,  11. ],
       [ 12. ,  13. ,  14. ,  14.4,  16. ,  17. ],
       [ 18. ,  19. ,  20.6,  21. ,  22. ,  23. ]])

This is faster than creating an inds array: 这比创建inds数组快：

In [169]: %%timeit
   .....: inds = np.where(b.mask)
   .....: b[inds] = np.take(c, inds[0])
   .....: 
10000 loops, best of 3: 81.2 µs per loop


In [173]: %%timeit
   .....: np.copyto(b, c[...,None], where=b.mask)
   .....: 
10000 loops, best of 3: 45.1 µs per loop

Another advantage is that it will warn you about the dtype issue: 另一个优点是它将警告您有关dtype的问题：

a = np.arange(24).reshape(4,-1)    # still an int
b = np.ma.masked_where(numpy.remainder(a,5)==0,a)
c = b.mean(1)

In [193]: np.copyto(b, c[...,None], where=b.mask)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-193-edc7f01f3f89> in <module>()
----> 1 np.copyto(b, c[...,None], where=b.mask)

TypeError: Can not cast scalar from dtype('float64') to dtype('int64') according to the rule 'same_kind'

By the way, there is a set of functions for such a task, depending on what different source formats you have, such as 顺便说一句，此任务有一组功能，具体取决于您使用的不同源格式，例如

np.put
sequentially puts the input array into the output array in locations given by indices and would work like @Ophion's answer. 依次将输入数组放在索引给定位置的输出数组中，就像@Ophion的答案一样工作。

np.place
sequentially assigns each element from the input (list or 1d array) into places in the output array wherever the mask is true, (not aligned with the input array, as their shapes don't have to match). 按顺序将来自输入（列表或1d数组）的每个元素分配到输出数组中掩码为true的位置（不与输入数组对齐，因为它们的形状不必匹配）。

np.copyto
will always put a value from the input array into the same (broadcasted) location in the output array. 将始终将输入数组中的值放入输出数组中的相同（广播）位置。 Shapes must match (or be broadcastable). 形状必须匹配（或可广播）。 It effectively replaces the older function np.putmask . 它有效地替代了较早的功能np.putmask 。

我如何用numpy的行平均值替换丢失/屏蔽的数据

问题描述

2 个解决方案

解决方案1
2 2013-10-30 16:43:57

解决方案2
2 已采纳 2013-10-30 17:07:21

我如何用numpy的行平均值替换丢失/屏蔽的数据

问题描述

2 个解决方案

解决方案1 2 2013-10-30 16:43:57

解决方案2 2 已采纳 2013-10-30 17:07:21

解决方案1
2 2013-10-30 16:43:57

解决方案2
2 已采纳 2013-10-30 17:07:21