简体   繁体   English

如何在NumPy中执行此python列表理解?

[英]How can I do this python list comprehension in NumPy?

Let's say I have an array of values, r , which range anywhere from 0 to 1 . 假设我有一个值数组r ,范围从01 I want to remove all values that are some threshold value away from the median. 我想删除距离中位数有些阈值的所有值。 Let's assume here that that threshold value is 0.5 , and len(r) = 3000 . 我们假设这个阈值是0.5len(r) = 3000 Then to mask out all values outside of this range, I can do a simple list comprehension, which I like: 然后为了掩盖这个范围之外的所有值,我可以做一个简单的列表理解,我喜欢:

mask = np.array([ri < np.median(r)-0.5 or ri > np.median(r)+0.5 for ri in r])

And if I use a timer on it: 如果我在它上面使用一个计时器:

import time
import numpy as np

start = time.time()
r = np.random.random(3000)
m = np.median(r)
maxr,minr = m-0.5, m+0.5
mask = [ri<minr or ri>maxr for ri in r]
end = time.time()
print('Took %.4f seconds'%(end-start))

>>> Took 0.0010 seconds

Is there a faster way to do this list comprehension and make the mask using NumPy ? 是否有更快的方法来执行此列表理解并使用NumPy制作掩码?


Edit: 编辑:

I've tried several suggestions below, including: 我在下面尝试了几条建议,包括:

  • An element-wise or operator: (r<minv) | (r>maxv) 元素或运算符: (r<minv) | (r>maxv) (r<minv) | (r>maxv)

  • A Numpy logical or: r[np.logical_or(r<minr, r>maxr)] Numpy逻辑或: r[np.logical_or(r<minr, r>maxr)]

  • A absolute difference boolean array: abs(mr) > 0.5 绝对差值布尔数组: abs(mr) > 0.5

And here is the average time each one took after 300 runs through: 以下是每次运行300次后的平均时间:

Python list comprehension: 0.6511 ms
Elementwise or: 0.0138 ms
Numpy logical or: 0.0241 ms
Absolute difference: 0.0248 ms

As you can see, the elementwise Or was always the fastest, by nearly a factor of two (don't know how that would scale with array elements). 正如你所看到的,elementwise Or总是最快的,几乎是两倍(不知道如何用数组元素缩放)。 Who knew. 谁知道。

You can use numpy conditional selections to create new array, without those values. 您可以使用numpy条件选择来创建新数组,而不使用这些值。

start = time.time()
m = np.median(r)
maxr,minr = m-0.5, m+0.5
filtered_array = r[ (r < minr) | (r > maxr) ]
end = time.time()
print('Took %.4f seconds'%(end-start))

filtered_array is slice of r without masked values (all values that will be later removed by mask already removed in filtered_array ). filtered_array是没有屏蔽值的r片段(稍后将通过在filtered_array删除的掩码删除所有值)。

Update: used shorter syntax suggested by @ayhan. 更新:使用@ayhan建议的较短语法。

一个班轮......

new_mask = abs(np.median(r) - r) > 0.5

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM