如何刪除numpy數組中的前導屏蔽元素？

Question

如何從 numpy 數組中刪除前導屏蔽元素。 例如下面的 [2 x 5] 掩碼數組：

m_arr = [[- - 1 - 1]
     [1 - - 1 1]]

刪除前導屏蔽元素的輸出將是

m_arr = [[1 - 1]
     [1 - - 1 1]]

我嘗試在列表理解中使用壓縮作為

[m.compressed.tolist() for m in m_arr]

但是通過執行 np.apply_along_axis 還沒有得到解決方案！！

Answer 1

好的，制作屏蔽數組：

In [96]: m_arr=np.ma.MaskedArray(np.arange(10).reshape(2,5),np.array([[1,1,0,1,0
    ...: ],[0,1,1,0,0]]))
In [97]: m_arr
Out[97]: 
masked_array(
  data=[[--, --, 2, --, 4],
        [5, --, --, 8, 9]],
  mask=[[ True,  True, False,  True, False],
        [False,  True,  True, False, False]],
  fill_value=999999)

迭代時看一維數組屬性：

In [99]: [(m.data,m.mask) for m in m_arr]
Out[99]: 
[(array([0, 1, 2, 3, 4]), array([ True,  True, False,  True, False])),
 (array([5, 6, 7, 8, 9]), array([False,  True,  True, False, False]))]

探索一款面膜：

In [100]: m_arr[0].mask
Out[100]: array([ True,  True, False,  True, False])
In [101]: np.logical_and.accumulate(m_arr[0].mask)
Out[101]: array([ True,  True, False, False, False])
In [104]: m_arr[0][~_101]
Out[104]: 
masked_array(data=[2, --, 4],
             mask=[False,  True, False],
       fill_value=999999)

將其包裝在函數中：

In [109]: def foo(m):
     ...:     mm = m.mask
     ...:     mm = ~np.logical_and.accumulate(mm)
     ...:     return m[mm]
     ...:

並逐行應用它：

In [110]: [foo(m) for m in m_arr]
Out[110]: 
[masked_array(data=[2, --, 4],
              mask=[False,  True, False],
        fill_value=999999),
 masked_array(data=[5, --, --, 8, 9],
              mask=[False,  True,  True, False, False],
        fill_value=999999)]

====

在后續問題中，您嘗試將logical_and應用於整個數組（而不是逐行）：

In [132]: np.logical_and.accumulate(m_arr.mask)
Out[132]: 
array([[ True,  True, False,  True, False],
       [False,  True, False, False, False]])

正確應用 - 按行：

In [133]: np.logical_and.accumulate(m_arr.mask, axis=1)
Out[133]: 
array([[ True,  True, False, False, False],
       [False, False, False, False, False]])

使用布爾掩碼總是會使結果變平。 我們已經明確表示結果不能是二維的——至少在每行真值數量不同的一般情況下是這樣。

In [134]: m_arr[~_]
Out[134]: 
masked_array(data=[2, --, 4, 5, --, --, 8, 9],
             mask=[False,  True, False, False,  True,  True, False, False],
       fill_value=999999)

Answer 2

你可以做同樣的事情，那我的評論和@ hpaulj的答案建議使用的適當應用np.split ，而不是自己寫的循環。

每行中第一個False的開始由下式給出

start = np.argmin(m_arr.mask, axis=1)

將其與每行中的零合並，並將索引線性化以獲得一些分割點：

pad = np.zeros(m_arr.shape[0], dtype=int)
indices = np.ravel(np.stack((pad, start), axis=-1) + np.arange(m_arr.shape[0])[:, None] * m_arr.shape[1])

然后，您可以拆分 raveled 數組的屏蔽部分和未屏蔽部分：

m_arr = np.split(m_arr.ravel(), indices)[2::2]

定時

在這種情況下並不是特別有趣，但我對以下函數的幾個調用進行了基准測試：

def foo(m):
    mm = m.mask
    mm = ~np.logical_and.accumulate(mm)
    return m[mm]

def bar_hpaulj(x):
    return [foo(m) for m in x]

def bar_MadPhysicist(x):
    return np.split(x.ravel(), (np.stack((np.zeros(x.shape[0], dtype=int), np.argmin(x.mask, axis=1)), axis=-1) + np.arange(x.shape[0])[:, None] * x.shape[1]).ravel()
)[2::2]

數組生成為{10, 100, 1000, 10000} n隨機平方：

m_arr = np.ma.MaskedArray(np.ones((n, n)), mask=np.random.randint(2, size=(n, n), dtype=bool))

時間是：

  n   |     bar_hpaulj    |  bar_MadPhysicist |
------+-------------------+-------------------+
   10 |  464 µs ± 1.54 µs |  966 µs ± 3.06 µs |
------+-------------------+-------------------+
  100 | 4.69 ms ± 20.2 µs | 8.31 ms ± 26.3 µs |
------+-------------------+-------------------+
 1000 |   67 ms ± 1.09 ms |  83.2 ms ± 309 µs |
------+-------------------+-------------------+
10000 |  2.38 s ± 29.5 ms |  835 ms ± 3.14 ms |
------+-------------------+-------------------+

如何刪除numpy數組中的前導屏蔽元素？

問題描述

2 個解決方案

解決方案1
2 已采納 2020-09-23 03:44:53

解決方案2
1 2020-09-23 04:08:15

如何刪除numpy數組中的前導屏蔽元素？

問題描述

2 個解決方案

解決方案1 2 已采納 2020-09-23 03:44:53

解決方案2 1 2020-09-23 04:08:15

解決方案1
2 已采納 2020-09-23 03:44:53

解決方案2
1 2020-09-23 04:08:15