根據更改閾值過濾np數組中值的最快方法

Question

我想基於某些閾值過濾數組arr 。

arr = np.array([2,2,2,2,2,5,5,5,1])
thresholds = np.array([4,1])

我想篩選arr基於值thresholds時在值arr是大於閾值

我的想法是為每個閾值創建一個掩碼

預期結果：

# [[False False False False False  True  True  True False]
#  [ True  True  True  True  True  True  True  True False]]

在Python中實現它的一種方法：

mask = [True if x>condi else False for condi in thresholds for x in arr]
mask = np.reshape(mask,(2,9))

然后通過filteredarr = arr[mask[i]]得到過濾后的數組，其中i是相關閾值的索引

在Python中有沒有更好的方法（性能明智）？ 特別是我正在處理大數組（對於arr來說len約為250000，還沒有針對thresholds特定len，但我期待一個大陣列）？

編輯：數據上預期的最終輸出是[array([5, 5, 5]), array([2, 2, 2, 2, 2, 5, 5, 5])]

Answer 1

使用可以很容易地獲得掩模

mask = arr[None,:]>thresholds[:,None]
mask

# Output
# array([[False, False, False, False, False,  True,  True,  True, False],
#        [ True,  True,  True,  True,  True,  True,  True,  True, False]], dtype=bool)

我們的想法是通過使用None （與np.newaxis相同）添加一個額外的軸來炸毀維度，然后逐個元素地比較數組。

一旦我們有了掩碼，我們就可以使用各種方法過濾數據，其中選擇很大程度上取決於您的問題：

當然可以

 res = [arr[m] for m in mask] # [array([5, 5, 5]), array([2, 2, 2, 2, 2, 5, 5, 5])]

為了獲得帶有過濾數據的列表，但它通常很慢。

如果你有進一步的數值計算，我會創建一個masked array ，其中只考慮過濾后的數據：

 m = np.zeros_like(mask).astype(np.int) m[:] = arr res = np.ma.masked_where(~mask,m)

現在，每條線根據相應的閾值對應於過濾的數據。 蒙版數組允許您繼續使用諸如mean或std類的許多函數

 res.mean(axis=1) # masked_array(data = [5.0 3.125], # mask = [False False], # fill_value = 1e+20) res.mean(axis=1).compressed() # array([ 5. , 3.125])

根據更改閾值過濾np數組中值的最快方法

問題描述

1 個解決方案

解決方案1
3 已采納 2015-11-17 12:32:54

根據更改閾值過濾np數組中值的最快方法

問題描述

1 個解決方案

解決方案1 3 已采納 2015-11-17 12:32:54

解決方案1
3 已采納 2015-11-17 12:32:54