[英]Fastest way to filter values in np array based on changing threshold
I want to filter an array arr
based on some thresholds. 我想基于某些阈值过滤数组
arr
。
arr = np.array([2,2,2,2,2,5,5,5,1])
thresholds = np.array([4,1])
I want to filter arr
based on the values in thresholds
when the value in arr
is greater than the threshold 我想筛选
arr
基于值thresholds
时在值arr
是大于阈值
My idea is to create a mask for each threshold 我的想法是为每个阈值创建一个掩码
Expected result: 预期结果:
# [[False False False False False True True True False]
# [ True True True True True True True True False]]
One way to do it in Python: 在Python中实现它的一种方法:
mask = [True if x>condi else False for condi in thresholds for x in arr]
mask = np.reshape(mask,(2,9))
Then to get the filtered array by just filteredarr = arr[mask[i]]
where i
is the index of the relevant threshold 然后通过
filteredarr = arr[mask[i]]
得到过滤后的数组,其中i
是相关阈值的索引
Is there a better way (performance wise) to do it in Python ? 在Python中有没有更好的方法(性能明智)? Especially that I am dealing with big arrays (len around 250000 for arr, no specific len for
thresholds
yet, but I am expecting a big array) ? 特别是我正在处理大数组(对于arr来说len约为250000,还没有针对
thresholds
特定len,但我期待一个大阵列)?
Edit: The final output expected on the data is [array([5, 5, 5]), array([2, 2, 2, 2, 2, 5, 5, 5])]
编辑:数据上预期的最终输出是
[array([5, 5, 5]), array([2, 2, 2, 2, 2, 5, 5, 5])]
The mask can easily be obtained using 使用可以很容易地获得掩模
mask = arr[None,:]>thresholds[:,None]
mask
# Output
# array([[False, False, False, False, False, True, True, True, False],
# [ True, True, True, True, True, True, True, True, False]], dtype=bool)
The idea is to blow up the dimensionality by adding an additional axis using None
(which does the same as np.newaxis
) and to compare then the arrays element-wise. 我们的想法是通过使用
None
(与np.newaxis
相同)添加一个额外的轴来炸毁维度,然后逐个元素地比较数组。
Once we have the mask we can filter the data using various methods where the choice strongly depends on your problem: 一旦我们有了掩码,我们就可以使用各种方法过滤数据,其中选择很大程度上取决于您的问题:
Of course you can do 当然可以
res = [arr[m] for m in mask] # [array([5, 5, 5]), array([2, 2, 2, 2, 2, 5, 5, 5])]
in order to obtain a list with the filtered data, but it is slow in general. 为了获得带有过滤数据的列表,但它通常很慢。
In case you have further numeric calculations I would create a masked array
in which only the filtered data are taken into account: 如果你有进一步的数值计算,我会创建一个
masked array
,其中只考虑过滤后的数据:
m = np.zeros_like(mask).astype(np.int) m[:] = arr res = np.ma.masked_where(~mask,m)
Each line corresponds now to the filtered data according to the corresponding threshold. 现在,每条线根据相应的阈值对应于过滤的数据。 Masked arrays allow you to continue working with many functions like
mean
or std
蒙版数组允许您继续使用诸如
mean
或std
类的许多函数
res.mean(axis=1) # masked_array(data = [5.0 3.125], # mask = [False False], # fill_value = 1e+20) res.mean(axis=1).compressed() # array([ 5. , 3.125])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.