简体   繁体   English

python numpy获取蒙版数据而不展平

[英]python numpy get masked data without flattening

How do I get the masked data only without flattening the data into a 1D array? 如何在不将数据展平为一维数组的情况下获取屏蔽数据? That is, suppose I have a numpy array 也就是说,假设我有一个numpy数组

a = np.array([[0, 1, 2, 3],
              [0, 1, 2, 3],
              [0, 1, 2, 3]])

and I mask all elements greater than 1, 并且我屏蔽了大于1的所有元素,

b = ma.masked_greater(a, 1)

masked_array(data =
 [[0 1 -- --]
 [0 1 -- --]
 [0 1 -- --]],
             mask =
 [[False False  True  True]
 [False False  True  True]
 [False False  True  True]],
       fill_value = 999999)

How do I get only the masked elements without flattening the output? 如何在不展平输出的情况下仅获取蒙版元素? That is, I need to get 也就是说,我需要得到

array([[ 2, 3],
       [2, 3],
       [2, 3]])

Lets try an example that produces a ragged result - different number of 'masked' values in each row. 让我们尝试一个产生粗糙结果的例子 - 每行中不同数量的'蒙面'值。

In [292]: a=np.arange(12).reshape(3,4)
In [293]: a
Out[293]: 
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
In [294]: a<6
Out[294]: 
array([[ True,  True,  True,  True],
       [ True,  True, False, False],
       [False, False, False, False]], dtype=bool)

The flattened list of values that match this condition. 与此条件匹配的展平值列表。 It can't return a regular 2d array, so it has to revert to a flattened array. 它不能返回常规的2d数组,因此必须恢复为扁平数组。

In [295]: a[a<6]
Out[295]: array([0, 1, 2, 3, 4, 5])

do the same thing, but iterating row by row 做同样的事情,但逐行迭代

In [296]: [a1[a1<6] for a1 in a]
Out[296]: [array([0, 1, 2, 3]), array([4, 5]), array([], dtype=int32)]

Trying to make an array of the result produces an object type array, which is little more than a list in an array wrapper: 尝试生成结果数组会生成一个对象类型数组,它只不过是数组包装器中的列表:

In [297]: np.array([a1[a1<6] for a1 in a])
Out[297]: array([array([0, 1, 2, 3]), array([4, 5]), array([], dtype=int32)], dtype=object)

The fact that the result is ragged is a good indicator that it is difficult, if not impossible, to perform that action with one vectorized operation. 结果粗糙的事实是一个很好的指标,即即使不是不可能,也很难用一个矢量化操作来执行该操作。


Here's another way of producing the list of arrays. 这是生成数组列表的另一种方法。 With sum I find how many elements there are in each row, and then use this to split the flattened array into sublists. sum我找到每行中有多少元素,然后使用它将展平的数组split为子列表。

In [320]: idx=(a<6).sum(1).cumsum()[:-1]
In [321]: idx
Out[321]: array([4, 6], dtype=int32)
In [322]: np.split(a[a<6], idx)
Out[322]: [array([0, 1, 2, 3]), array([4, 5]), array([], dtype=float64)]

It does use 'flattening'. 它确实使用'展平'。 And for these small examples it is slower than the row iteration. 对于这些小例子,它比行迭代慢。 (Don't worry about the empty float array, split had to construct something and used a default dtype. ) (不要担心空的float数组, split必须构造一些东西并使用默认的dtype。)


A different mask, without empty rows clearly shows the equivalence of the 2 approaches. 一个不同的掩码,没有空行,清楚地显示了两种方法的等价性。

In [344]: mask=np.tri(3,4,dtype=bool)  # lower tri
In [345]: mask
Out[345]: 
array([[ True, False, False, False],
       [ True,  True, False, False],
       [ True,  True,  True, False]], dtype=bool)
In [346]: idx=mask.sum(1).cumsum()[:-1]
In [347]: idx
Out[347]: array([1, 3], dtype=int32)
In [348]: [a1[m] for a1,m in zip(a,mask)]
Out[348]: [array([0]), array([4, 5]), array([ 8,  9, 10])]
In [349]: np.split(a[mask],idx)
Out[349]: [array([0]), array([4, 5]), array([ 8,  9, 10])]

Zip the two lists together, and then filter them out: 将两个列表压缩在一起,然后将其过滤掉:

data = [[0, 1, 1, 1], [0, 1, 1, 1], [0, 1, 1, 1]]

mask = [[False, False,  True,  True],
 [False, False,  True,  True],
 [False, False,  True,  True]]

zipped = zip(data, mask) # [([0, 1, 1, 1], [False, False, True, True]), ([0, 1, 1, 1], [False, False, True, True]), ([0, 1, 1, 1], [False, False, True, True])]

masked = []
for lst, mask in zipped:
    pairs = zip(lst, mask)  # [(0, False), (1, False), (1, True), (1, True)]
    masked.append([num for num, b in pairs if b])

print(masked)  # [[1, 1], [1, 1], [1, 1]]

or more succinctly: 或者更简洁:

zipped = [...]
masked = [[num for num, b in zip(lst, mask) if b] for lst, mask in zipped]
print(masked)  # [[1, 1], [1, 1], [1, 1]]

Due to vectorization in numpy you can use np.where to select items from the first array and use None (or some arbitrary value) to indicate the places that a value has been masked out. 由于numpy中的矢量化,您可以使用np.where从第一个数组中选择项目,并使用None (或某个任意值)来指示值已被屏蔽掉的位置。 Note that this means you have to use a less compact representation for the array so may want to use -1 or some special value. 请注意,这意味着您必须对数组使用不太紧凑的表示,因此可能需要使用-1或某些特殊值。

import numpy as np

a = np.array([
    [0, 1, 2, 3],
    [0, 1, 2, 3],
    [0, 1, 2, 3]])

mask = np.array([[ True,  True,  True,  True],
    [ True, False,  True,  True],
    [False,  True,  True, False]])

np.where(a, np.array, None)

This produces 这产生了

array([[0, 1, 2, 3],
   [0, None, 2, 3],
   [None, 1, 2, None]], dtype=object)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM