简体   繁体   English

掩模2D阵列保持形状

[英]Mask 2D array preserving shape

I have 2D numpy array something like this: 我有这样的2D numpy数组:

arr = np.array([[1,2,4],
                [2,1,1],
                [1,2,3]])

and a boolean array: 和一个布尔数组:

boolarr = np.array([[True, True, False],
                    [False, False, True],
                    [True, True,True]])

Now, when I try to slice arr based on boolarr, it gives me 现在,当我尝试根据boolarr切片arr时,它给了我

arr[boolarr]

Output: 输出:

array([1, 2, 1, 1, 2, 3])

But I am looking to have a 2D array output instead. 但我希望有一个2D数组输出。 The desired output is 期望的输出是

[[1, 2],
 [1],
 [1, 2, 3]]

An option using numpy is to start by adding up rows in the mask : 使用numpy的选项是首先在mask添加行:

take = boolarr.sum(axis=1)
#array([2, 1, 3])

Then mask the array as you do: 然后像你一样屏蔽数组:

x = arr[boolarr]
#array([1, 2, 1, 1, 2, 3])

And use np.split to split the flat array according to the np.cumsum of take (as the function expects the indices where to split the array): 并使用np.split分裂根据所述扁平阵列np.cumsumtake (作为函数希望的索引,其中到阵列分割):

np.split(x, np.cumsum(take)[:-1])
[array([1, 2]), array([1]), array([1, 2, 3])]

General solution 一般解决方案

def mask_nd(x, m):
    '''
    Mask a 2D array and preserve the
    dimension on the resulting array
    ----------
    x: np.array
       2D array on which to apply a mask
    m: np.array
        2D boolean mask  
    Returns
    -------
    List of arrays. Each array contains the
    elements from the rows in x once masked.
    If no elements in a row are selected the 
    corresponding array will be empty
    '''
    take = m.sum(axis=1)
    return np.split(x[m], np.cumsum(take)[:-1])

Examples 例子

Lets have a look at some examples: 让我们看看一些例子:

arr = np.array([[1,2,4],
                [2,1,1],
                [1,2,3]])

boolarr = np.array([[True, True, False],
                    [False, False, False],
                    [True, True,True]])

mask_nd(arr, boolarr)
# [array([1, 2]), array([], dtype=int32), array([1, 2, 3])]

Or for the following arrays: 或者对于以下数组:

arr = np.array([[1,2],
                [2,1]])

boolarr = np.array([[True, True],
                    [True, False]])

mask_nd(arr, boolarr)
# [array([1, 2]), array([2])]

Your desired output is not a 2D array, since each "row" has a different number of "columns". 您所需的输出不是2D数组,因为每个“行”具有不同数量的“列”。 A functional non-vectorised solution is possible via itertools.compress : 可以通过itertools.compress实现功能性非矢量化解决方案:

from itertools import compress

res = list(map(list, map(compress, arr, boolarr)))

# [[1, 2], [1], [1, 2, 3]]

Here's one way to do it with list instead: 以下是使用list此操作的一种方法:

[[arr[row][col] for col in range(3) if boolarr[row][col]] for row in range(3)]
# [[1,2], [1], [1,2,3]]

You may be looking for something as simple as a masked array . 你可能正在寻找像蒙面数组一样简单的东西。 You can use the mask to create an array that masks out the desired values, so that they are not affected by further operations and don't affect the results of calculations: 您可以使用掩码创建一个掩盖所需值的数组,以便它们不受进一步操作的影响,并且不会影响计算结果:

marr = np.ma.array(arr, mask=~boolarr)

Notice that the mask must be flipped since it's the invalid elements that are masked. 请注意,必须翻转掩码,因为它是被屏蔽的无效元素。 The result will look like 结果看起来像

masked_array(data=[
        [ 1  2 --]
        [-- --  1]
        [ 1  2  3]],
    mask=[
        [False False  True]
        [ True  True False]
        [False False False]],
    fill_value = 999999)
In [183]: np.array([x[y] for x,y in zip(arr, boolarr)])
Out[183]: array([array([1, 2]), array([1]), array([1, 2, 3])], dtype=object)

should be competitive in speed. 应该具有竞争力。 (It's a little faster if we omit the outer np.array wrap, returning just a list of arrays.) (如果我们省略外部的np.array包装,只返回一个数组列表,它会快一点。)

But realistic time tests are needed to be sure. 但需要确定现实的时间测试。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM