[英]Mask 2D array preserving shape
I have 2D numpy array something like this: 我有这样的2D numpy数组:
arr = np.array([[1,2,4],
[2,1,1],
[1,2,3]])
and a boolean array: 和一个布尔数组:
boolarr = np.array([[True, True, False],
[False, False, True],
[True, True,True]])
Now, when I try to slice arr based on boolarr, it gives me 现在,当我尝试根据boolarr切片arr时,它给了我
arr[boolarr]
Output: 输出:
array([1, 2, 1, 1, 2, 3])
But I am looking to have a 2D array output instead. 但我希望有一个2D数组输出。 The desired output is
期望的输出是
[[1, 2],
[1],
[1, 2, 3]]
An option using numpy
is to start by adding up rows in the mask
: 使用
numpy
的选项是首先在mask
添加行:
take = boolarr.sum(axis=1)
#array([2, 1, 3])
Then mask the array as you do: 然后像你一样屏蔽数组:
x = arr[boolarr]
#array([1, 2, 1, 1, 2, 3])
And use np.split
to split the flat array according to the np.cumsum
of take
(as the function expects the indices where to split the array): 并使用
np.split
分裂根据所述扁平阵列np.cumsum
的take
(作为函数希望的索引,其中到阵列分割):
np.split(x, np.cumsum(take)[:-1])
[array([1, 2]), array([1]), array([1, 2, 3])]
General solution
一般解决方案
def mask_nd(x, m):
'''
Mask a 2D array and preserve the
dimension on the resulting array
----------
x: np.array
2D array on which to apply a mask
m: np.array
2D boolean mask
Returns
-------
List of arrays. Each array contains the
elements from the rows in x once masked.
If no elements in a row are selected the
corresponding array will be empty
'''
take = m.sum(axis=1)
return np.split(x[m], np.cumsum(take)[:-1])
Examples
例子
Lets have a look at some examples: 让我们看看一些例子:
arr = np.array([[1,2,4],
[2,1,1],
[1,2,3]])
boolarr = np.array([[True, True, False],
[False, False, False],
[True, True,True]])
mask_nd(arr, boolarr)
# [array([1, 2]), array([], dtype=int32), array([1, 2, 3])]
Or for the following arrays: 或者对于以下数组:
arr = np.array([[1,2],
[2,1]])
boolarr = np.array([[True, True],
[True, False]])
mask_nd(arr, boolarr)
# [array([1, 2]), array([2])]
Your desired output is not a 2D array, since each "row" has a different number of "columns". 您所需的输出不是2D数组,因为每个“行”具有不同数量的“列”。 A functional non-vectorised solution is possible via
itertools.compress
: 可以通过
itertools.compress
实现功能性非矢量化解决方案:
from itertools import compress
res = list(map(list, map(compress, arr, boolarr)))
# [[1, 2], [1], [1, 2, 3]]
Here's one way to do it with list
instead: 以下是使用
list
此操作的一种方法:
[[arr[row][col] for col in range(3) if boolarr[row][col]] for row in range(3)]
# [[1,2], [1], [1,2,3]]
You may be looking for something as simple as a masked array . 你可能正在寻找像蒙面数组一样简单的东西。 You can use the mask to create an array that masks out the desired values, so that they are not affected by further operations and don't affect the results of calculations:
您可以使用掩码创建一个掩盖所需值的数组,以便它们不受进一步操作的影响,并且不会影响计算结果:
marr = np.ma.array(arr, mask=~boolarr)
Notice that the mask must be flipped since it's the invalid elements that are masked. 请注意,必须翻转掩码,因为它是被屏蔽的无效元素。 The result will look like
结果看起来像
masked_array(data=[
[ 1 2 --]
[-- -- 1]
[ 1 2 3]],
mask=[
[False False True]
[ True True False]
[False False False]],
fill_value = 999999)
In [183]: np.array([x[y] for x,y in zip(arr, boolarr)])
Out[183]: array([array([1, 2]), array([1]), array([1, 2, 3])], dtype=object)
should be competitive in speed. 应该具有竞争力。 (It's a little faster if we omit the outer
np.array
wrap, returning just a list of arrays.) (如果我们省略外部的
np.array
包装,只返回一个数组列表,它会快一点。)
But realistic time tests are needed to be sure. 但需要确定现实的时间测试。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.