[英]How can I group rows together based upon certain conditions? (R or Python)
[英]How to automatically group conditions together in python?
我正在尝试在python中自动将条件组合在一起。 困难在于,如果有几个条件,如100个条件,那么手工“和”所有这些将是繁琐的。 我怎样才能使用循环来实现这一目标?
import pandas as pd
s1 = pd.Series([1,2,3,4,5,6])
s2 = pd.Series([5,6,7,8,9,10])
s3 = pd.Series([11,12,5,7,8,2])
df = pd.DataFrame({'A': s1,'B': s2,'C': s3})
condition1 = df['A'] > 3
condition2 = df['B'] > 6
condition3 = df['C'] > 5
# AND Operation ->>> Can be achieved with a loop?
select = condition1 & condition2 & condition3
您可以通过创建条件列表并使用reduce
来实现它:
from functools import reduce
conditions = [
df['A'] > 3,
df['B'] > 6,
df['C'] > 5,
]
total_condition = reduce(lambda x, y: x & y, conditions)
测试用例:
d = pd.DataFrame(np.random.randint(1, 5, (700000, 3)), columns=["a", "b", "c"])
conditions = [
d["a"] > 2,
d["c"] > 1,
d["b"] > 2,
]*100
使用reduce
:
from functools import reduce
%timeit reduce(lambda x, y: x & y, conditions)
> 547 ms ± 14.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
使用np.concat
+ df.all()
:
%timeit pd.concat(conditions, axis=1).all(1)
> 4.19 s ± 367 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
有几点需要注意:
np.ndarray.all
(或pd.DataFrame.all
)来计算它们的交集。 您可以使用NumPy或Pandas的布尔系列列表:
conditions = [df['A'] > 3,
df['B'] > 6,
df['C'] > 5]
# all equivalent
select = pd.concat(conditions, axis=1).all(axis=1)
select = np.logical_and.reduce(conditions)
select = np.array(conditions).all(axis=0)
print(select)
array([False, False, False, True, True, False], dtype=bool)
同样,如果您希望命名布尔过滤器,则可以使用字典:
conditions = {1: df['A'] > 3,
2: df['B'] > 6,
3: df['C'] > 5}
select = np.array(list(conditions.values())).all(axis=0)
绩效基准
性能将非常依赖于数据,您还应该根据@ Kopytok的解决方案尝试reduce
并检查数据的性能。
df = pd.concat([df]*1000)
conditions = [df['A'] > 3,
df['B'] > 6,
df['C'] > 5]
conditions = conditions*100
%timeit reduce(lambda x, y: x & y, conditions) # 104 ms per loop
%timeit np.logical_and.reduce(conditions) # 104 ms per loop
%timeit np.array(conditions).all(axis=0) # 99.4 ms per loop
%timeit pd.concat(conditions, axis=1).all(axis=1) # 34.6 ms per loop
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.