多个条件的过滤列表

Question

What is the most efficient way to filter multiple lists through multiple conditions?通过多个条件过滤多个列表的最有效方法是什么？ Ex.前任。

time = [1, 2, 3, 4, 5]
x = [3, 3, 3, 3, 3]
y = [5, 5, 8, 8, 8]

I want to find the indices conditioning on all 3 lists such that我想找到所有 3 个列表上的索引条件，这样

time < 4
x > 2
y > 7

which should give me the result这应该给我结果

[2]

The pandas method if the above data was in a dataframe would be如果上述数据在 dataframe 中，则 pandas 方法将是

(df['time'] < 4) & (df['x'] > 2) & (df['y'] > 7)

The brute force method would be something like蛮力方法类似于

idx = []
for i in range(len(time)):
    if time[i] < 4 and x[i] > 2 and y[i] > 7:
        idx.append(i)

However, these variables can get quite large (500,000+), so a for loop would be quite inefficient.但是，这些变量可能会变得非常大（500,000+），因此 for 循环将非常低效。

Answer 1

Do you want something like below你想要像下面这样的东西吗

[k for k,g in enumerate(zip(time,x,y)) if g[0]<4 and g[1]>2 and g[2]>7]

or faster (Thank @Steven Rumbalski)或更快（感谢@Steven Rumbalski）

[k for k, (_t, _x, _y) in enumerate(zip(time, x, y)) if _t < 4 and _x > 2 and _y > 7]

which gives这使

[2]

Answer 2

I %timeit 3 methods.我%timeit 3 种方法。 The apply method showed the best speed - 25% better then py loop, and 2x faster than DF basic compare. apply 方法显示出最好的速度——比 py 循环快 25%，比 DF 基本比较快 2 倍。

def foo(df):
    return list(df[(df.time < 4) & (df.x > 2) & (df.y > 7)].time)

print("\nMETHOD 1: DF comparison")
%timeit foo(df) 
%timeit foo(df) 
%timeit foo(df) 

def foo(t, x, y):
    idx = []
    for i in range(len(t)):
        if t[i] < 4 and x[i] > 2 and y[i] > 7:
            idx.append(i)
    return(idx)
print("\nMETHOD 2: py loop")
%timeit foo(time, x, y) 
%timeit foo(time, x, y) 
%timeit foo(time, x, y) 

def foo(df):
    return df.apply(lambda r: r.time if ((r.time < 4) & (r.x > 2)& (r.y > 7)) else np.nan, axis=1).unique()

print("\nMETHOD 3: DF apply")
%timeit foo(df) 
%timeit foo(df) 
%timeit foo(df)

METHOD 1: DF comparison
698 µs ± 19.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
696 µs ± 6.21 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
690 µs ± 16.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

METHOD 2: py loop
481 µs ± 11.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
468 µs ± 4.08 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
466 µs ± 4.71 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

METHOD 3: DF apply
355 ms ± 5.82 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
356 ms ± 10.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
360 ms ± 8.78 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Answer 3

time = [1, 2, 3, 4, 5]
x = [3, 3, 3, 3, 3]
y = [5, 5, 8, 8, 8]

# Get True/False lists
t = [x<4 for x in time]
x1 = [i > 2 for i in x]
y2 = [i > 7 for i in y]

# Bitwise and
b = [x&y for x,y in zip(t,x1)]
# Bitwise and again, with index lookup of remianing True
time[[x&y for x,y in zip(b,y2)].index(True)]

Output Output

多个条件的过滤列表

问题描述

3 个解决方案

解决方案1
1 已采纳 2020-12-07 23:58:18

解决方案2
1 2020-12-08 00:12:35

解决方案3
-1 2020-12-07 23:44:18

多个条件的过滤列表

问题描述

3 个解决方案

解决方案1 1 已采纳 2020-12-07 23:58:18

解决方案2 1 2020-12-08 00:12:35

解决方案3 -1 2020-12-07 23:44:18

解决方案1
1 已采纳 2020-12-07 23:58:18

解决方案2
1 2020-12-08 00:12:35

解决方案3
-1 2020-12-07 23:44:18