[英]Filtering list on multiple conditions
What is the most efficient way to filter multiple lists through multiple conditions?通过多个条件过滤多个列表的最有效方法是什么? Ex.前任。
time = [1, 2, 3, 4, 5]
x = [3, 3, 3, 3, 3]
y = [5, 5, 8, 8, 8]
I want to find the indices conditioning on all 3 lists such that我想找到所有 3 个列表上的索引条件,这样
time < 4
x > 2
y > 7
which should give me the result这应该给我结果
[2]
The pandas method if the above data was in a dataframe would be如果上述数据在 dataframe 中,则 pandas 方法将是
(df['time'] < 4) & (df['x'] > 2) & (df['y'] > 7)
The brute force method would be something like蛮力方法类似于
idx = []
for i in range(len(time)):
if time[i] < 4 and x[i] > 2 and y[i] > 7:
idx.append(i)
However, these variables can get quite large (500,000+), so a for loop would be quite inefficient.但是,这些变量可能会变得非常大(500,000+),因此 for 循环将非常低效。
Do you want something like below你想要像下面这样的东西吗
[k for k,g in enumerate(zip(time,x,y)) if g[0]<4 and g[1]>2 and g[2]>7]
or faster (Thank @Steven Rumbalski)或更快(感谢@Steven Rumbalski)
[k for k, (_t, _x, _y) in enumerate(zip(time, x, y)) if _t < 4 and _x > 2 and _y > 7]
which gives这使
[2]
I %timeit
3 methods.我%timeit
3 种方法。 The apply method showed the best speed - 25% better then py loop, and 2x faster than DF basic compare. apply 方法显示出最好的速度——比 py 循环快 25%,比 DF 基本比较快 2 倍。
def foo(df):
return list(df[(df.time < 4) & (df.x > 2) & (df.y > 7)].time)
print("\nMETHOD 1: DF comparison")
%timeit foo(df)
%timeit foo(df)
%timeit foo(df)
def foo(t, x, y):
idx = []
for i in range(len(t)):
if t[i] < 4 and x[i] > 2 and y[i] > 7:
idx.append(i)
return(idx)
print("\nMETHOD 2: py loop")
%timeit foo(time, x, y)
%timeit foo(time, x, y)
%timeit foo(time, x, y)
def foo(df):
return df.apply(lambda r: r.time if ((r.time < 4) & (r.x > 2)& (r.y > 7)) else np.nan, axis=1).unique()
print("\nMETHOD 3: DF apply")
%timeit foo(df)
%timeit foo(df)
%timeit foo(df)
METHOD 1: DF comparison
698 µs ± 19.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
696 µs ± 6.21 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
690 µs ± 16.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
METHOD 2: py loop
481 µs ± 11.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
468 µs ± 4.08 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
466 µs ± 4.71 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
METHOD 3: DF apply
355 ms ± 5.82 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
356 ms ± 10.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
360 ms ± 8.78 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
time = [1, 2, 3, 4, 5]
x = [3, 3, 3, 3, 3]
y = [5, 5, 8, 8, 8]
# Get True/False lists
t = [x<4 for x in time]
x1 = [i > 2 for i in x]
y2 = [i > 7 for i in y]
# Bitwise and
b = [x&y for x,y in zip(t,x1)]
# Bitwise and again, with index lookup of remianing True
time[[x&y for x,y in zip(b,y2)].index(True)]
Output Output
3
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.