根据更改阈值过滤np数组中值的最快方法

Question

I want to filter an array arr based on some thresholds. 我想基于某些阈值过滤数组arr 。

arr = np.array([2,2,2,2,2,5,5,5,1])
thresholds = np.array([4,1])

I want to filter arr based on the values in thresholds when the value in arr is greater than the threshold 我想筛选arr基于值thresholds时在值arr是大于阈值

My idea is to create a mask for each threshold 我的想法是为每个阈值创建一个掩码

Expected result: 预期结果：

# [[False False False False False  True  True  True False]
#  [ True  True  True  True  True  True  True  True False]]

One way to do it in Python: 在Python中实现它的一种方法：

mask = [True if x>condi else False for condi in thresholds for x in arr]
mask = np.reshape(mask,(2,9))

Then to get the filtered array by just filteredarr = arr[mask[i]] where i is the index of the relevant threshold 然后通过filteredarr = arr[mask[i]]得到过滤后的数组，其中i是相关阈值的索引

Is there a better way (performance wise) to do it in Python ? 在Python中有没有更好的方法（性能明智）？ Especially that I am dealing with big arrays (len around 250000 for arr, no specific len for thresholds yet, but I am expecting a big array) ? 特别是我正在处理大数组（对于arr来说len约为250000，还没有针对thresholds特定len，但我期待一个大阵列）？

Edit: The final output expected on the data is [array([5, 5, 5]), array([2, 2, 2, 2, 2, 5, 5, 5])] 编辑：数据上预期的最终输出是[array([5, 5, 5]), array([2, 2, 2, 2, 2, 5, 5, 5])]

Answer 1

The mask can easily be obtained using 使用可以很容易地获得掩模

mask = arr[None,:]>thresholds[:,None]
mask

# Output
# array([[False, False, False, False, False,  True,  True,  True, False],
#        [ True,  True,  True,  True,  True,  True,  True,  True, False]], dtype=bool)

The idea is to blow up the dimensionality by adding an additional axis using None (which does the same as np.newaxis ) and to compare then the arrays element-wise. 我们的想法是通过使用None （与np.newaxis相同）添加一个额外的轴来炸毁维度，然后逐个元素地比较数组。

Once we have the mask we can filter the data using various methods where the choice strongly depends on your problem: 一旦我们有了掩码，我们就可以使用各种方法过滤数据，其中选择很大程度上取决于您的问题：

Of course you can do 当然可以
```
 res = [arr[m] for m in mask] # [array([5, 5, 5]), array([2, 2, 2, 2, 2, 5, 5, 5])] 
```
in order to obtain a list with the filtered data, but it is slow in general. 为了获得带有过滤数据的列表，但它通常很慢。
In case you have further numeric calculations I would create a masked array in which only the filtered data are taken into account: 如果你有进一步的数值计算，我会创建一个masked array ，其中只考虑过滤后的数据：
```
 m = np.zeros_like(mask).astype(np.int) m[:] = arr res = np.ma.masked_where(~mask,m) 
```
Each line corresponds now to the filtered data according to the corresponding threshold. 现在，每条线根据相应的阈值对应于过滤的数据。 Masked arrays allow you to continue working with many functions like mean or std 蒙版数组允许您继续使用诸如mean或std类的许多函数
```
 res.mean(axis=1) # masked_array(data = [5.0 3.125], # mask = [False False], # fill_value = 1e+20) res.mean(axis=1).compressed() # array([ 5. , 3.125]) 
```

根据更改阈值过滤np数组中值的最快方法

问题描述

1 个解决方案

解决方案1
3 已采纳 2015-11-17 12:32:54

根据更改阈值过滤np数组中值的最快方法

问题描述

1 个解决方案

解决方案1 3 已采纳 2015-11-17 12:32:54

解决方案1
3 已采纳 2015-11-17 12:32:54