[英]How do I replace missing/masked data with a row mean with numpy
How would I replace the missing values in the 'b' array below with the corresponding row averages in 'c'? 如何将下面的“ b”数组中的缺失值替换为“ c”中的相应行平均值?
a=numpy.arange(24).reshape(4,-1)
b=numpy.ma.masked_where(numpy.remainder(a,5)==0,a);b
Out[46]:
masked_array(data =
[[-- 1 2 3 4 --]
[6 7 8 9 -- 11]
[12 13 14 -- 16 17]
[18 19 -- 21 22 23]],
mask =
[[ True False False False False True]
[False False False False True False]
[False False False True False False]
[False False True False False False]],
fill_value = 999999)
c=b.mean(axis=1);c
Out[47]:
masked_array(data = [2.5 8.2 14.4 20.6],
mask = [False False False False],
fill_value = 1e+20)
You can use where
and take
: 您可以where
take
:
inds = np.where(b.mask)
b[inds] = np.take(c,inds[0])
b
masked_array(data =
[[2 1 2 3 4 2]
[6 7 8 9 8 11]
[12 13 14 14 16 17]
[18 19 20 21 22 23]],
mask =
[[False False False False False False]
[False False False False False False]
[False False False False False False]
[False False False False False False]],
fill_value = 999999)
In this particular example you have issues with the dtype
of a
. 在这个特殊的例子,你有问题dtype
的a
。 If you add a = a.astype(np.float)
before the creation of b
it works just fine. 如果在创建b
之前添加a = a.astype(np.float)
,则效果很好。 There may be a faster way to create the indices then np.where
. 创建索引的方法可能比np.where
。
Try this: 尝试这个:
np.copyto(b, c[...,None], where=b.mask)
You have to add the extra axis to c
so that it knows to apply it to each row. 您必须将多余的轴添加到c
以便它知道将其应用于每行。 (if np.mean
had a keepdims
option like np.sum
, this wouldn't be necessary :P (如果np.mean
具有类似np.sum
的keepdims
选项,则不必:P
import numpy as np
a = np.arange(24).reshape(4,-1).astype(float) # I changed your example to be a float
b = np.ma.masked_where(numpy.remainder(a,5)==0,a)
c = b.mean(1)
np.copyto(b, c[...,None], where=b.mask)
In [189]: b.data
Out[189]:
array([[ 2.5, 1. , 2. , 3. , 4. , 2.5],
[ 6. , 7. , 8. , 9. , 8.2, 11. ],
[ 12. , 13. , 14. , 14.4, 16. , 17. ],
[ 18. , 19. , 20.6, 21. , 22. , 23. ]])
This is faster than creating an inds
array: 这比创建inds
数组快:
In [169]: %%timeit
.....: inds = np.where(b.mask)
.....: b[inds] = np.take(c, inds[0])
.....:
10000 loops, best of 3: 81.2 µs per loop
In [173]: %%timeit
.....: np.copyto(b, c[...,None], where=b.mask)
.....:
10000 loops, best of 3: 45.1 µs per loop
Another advantage is that it will warn you about the dtype issue: 另一个优点是它将警告您有关dtype的问题:
a = np.arange(24).reshape(4,-1) # still an int
b = np.ma.masked_where(numpy.remainder(a,5)==0,a)
c = b.mean(1)
In [193]: np.copyto(b, c[...,None], where=b.mask)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-193-edc7f01f3f89> in <module>()
----> 1 np.copyto(b, c[...,None], where=b.mask)
TypeError: Can not cast scalar from dtype('float64') to dtype('int64') according to the rule 'same_kind'
By the way, there is a set of functions for such a task, depending on what different source formats you have, such as 顺便说一句,此任务有一组功能,具体取决于您使用的不同源格式,例如
np.put
sequentially puts the input array into the output array in locations given by indices and would work like @Ophion's answer. 依次将输入数组放在索引给定位置的输出数组中,就像@Ophion的答案一样工作。
np.place
sequentially assigns each element from the input (list or 1d array) into places in the output array wherever the mask is true, (not aligned with the input array, as their shapes don't have to match). 按顺序将来自输入(列表或1d数组)的每个元素分配到输出数组中掩码为true的位置(不与输入数组对齐,因为它们的形状不必匹配)。
np.copyto
will always put a value from the input array into the same (broadcasted) location in the output array. 将始终将输入数组中的值放入输出数组中的相同(广播)位置。 Shapes must match (or be broadcastable). 形状必须匹配(或可广播)。 It effectively replaces the older function np.putmask
. 它有效地替代了较早的功能np.putmask
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.