简体   繁体   English

多个条件的过滤列表

[英]Filtering list on multiple conditions

What is the most efficient way to filter multiple lists through multiple conditions?通过多个条件过滤多个列表的最有效方法是什么? Ex.前任。

time = [1, 2, 3, 4, 5]
x = [3, 3, 3, 3, 3]
y = [5, 5, 8, 8, 8]

I want to find the indices conditioning on all 3 lists such that我想找到所有 3 个列表上的索引条件,这样

time < 4
x > 2
y > 7

which should give me the result这应该给我结果

[2]

The pandas method if the above data was in a dataframe would be如果上述数据在 dataframe 中,则 pandas 方法将是

(df['time'] < 4) & (df['x'] > 2) & (df['y'] > 7)

The brute force method would be something like蛮力方法类似于

idx = []
for i in range(len(time)):
    if time[i] < 4 and x[i] > 2 and y[i] > 7:
        idx.append(i)

However, these variables can get quite large (500,000+), so a for loop would be quite inefficient.但是,这些变量可能会变得非常大(500,000+),因此 for 循环将非常低效。

Do you want something like below你想要像下面这样的东西吗

[k for k,g in enumerate(zip(time,x,y)) if g[0]<4 and g[1]>2 and g[2]>7]

or faster (Thank @Steven Rumbalski)或更快(感谢@Steven Rumbalski)

[k for k, (_t, _x, _y) in enumerate(zip(time, x, y)) if _t < 4 and _x > 2 and _y > 7]

which gives这使

[2]

I %timeit 3 methods.%timeit 3 种方法。 The apply method showed the best speed - 25% better then py loop, and 2x faster than DF basic compare. apply 方法显示出最好的速度——比 py 循环快 25%,比 DF 基本比较快 2 倍。

def foo(df):
    return list(df[(df.time < 4) & (df.x > 2) & (df.y > 7)].time)

print("\nMETHOD 1: DF comparison")
%timeit foo(df) 
%timeit foo(df) 
%timeit foo(df) 

def foo(t, x, y):
    idx = []
    for i in range(len(t)):
        if t[i] < 4 and x[i] > 2 and y[i] > 7:
            idx.append(i)
    return(idx)
print("\nMETHOD 2: py loop")
%timeit foo(time, x, y) 
%timeit foo(time, x, y) 
%timeit foo(time, x, y) 

def foo(df):
    return df.apply(lambda r: r.time if ((r.time < 4) & (r.x > 2)& (r.y > 7)) else np.nan, axis=1).unique()

print("\nMETHOD 3: DF apply")
%timeit foo(df) 
%timeit foo(df) 
%timeit foo(df)

METHOD 1: DF comparison
698 µs ± 19.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
696 µs ± 6.21 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
690 µs ± 16.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

METHOD 2: py loop
481 µs ± 11.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
468 µs ± 4.08 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
466 µs ± 4.71 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

METHOD 3: DF apply
355 ms ± 5.82 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
356 ms ± 10.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
360 ms ± 8.78 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
time = [1, 2, 3, 4, 5]
x = [3, 3, 3, 3, 3]
y = [5, 5, 8, 8, 8]

# Get True/False lists
t = [x<4 for x in time]
x1 = [i > 2 for i in x]
y2 = [i > 7 for i in y]

# Bitwise and
b = [x&y for x,y in zip(t,x1)]
# Bitwise and again, with index lookup of remianing True
time[[x&y for x,y in zip(b,y2)].index(True)]

Output Output

3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM