[英]Filter rows from ogrouped data frames based on string & boolean columns
I have the following data frame:我有以下数据框:
data = {
'Day':[7,7,7,7,5,5,5,5],
'Direction': ["North","NorthEast","NorthWest","West","East","EastWest","EastNorth","West"],
'Bool':[True,False,False,False,True,False,False,False],}
df = pd.DataFrame(data)
df.groupby(["Day"])
Day Direction Bool
0 7 North True
1 7 NorthEast False
2 7 NorthWest False
3 7 West False
4 5 East True
5 5 EastWest False
6 5 EastNorth False
7 5 West False
I would like to filter for each group by Day, the rows where the string column df['Direction']
is not contained in the row from df['Direction']
where df['Bool']
is True
.我想按天过滤每个组,字符串列
df['Direction']
不包含在df['Direction']
的行中的行,其中df['Bool']
是True
。
So for example in the first group, df['Direction']= "West"
it's not a match with df["direction"]= "North"
(where df["Bool"]== True
) so it's dropped.因此,例如在第一组中,
df['Direction']= "West"
它与df["direction"]= "North"
(其中df["Bool"]== True
)不匹配,因此它被丢弃了。 df["Direction"]="NorthWest"
is a match since the string contains North
so it's kept. df["Direction"]="NorthWest"
是一个匹配项,因为字符串包含North
所以它被保留。
Expected Output:预期 Output:
Day Direction Bool
0 7 North True
1 7 NorthEast False
2 7 NorthWest False
3 5 East True
4 5 EastWest False
5 5 EastNorth False
The rows do not always have the same order, so using shift()
is not an option.行并不总是具有相同的顺序,因此不能使用
shift()
。 I'm wondering if there's a quick way to do this without using a loop as well.我想知道是否有一种快速的方法可以在不使用循环的情况下做到这一点。
IIUC, you can use groupby.apply
with boolean slicing: IIUC,您可以将
groupby.apply
与 boolean 切片一起使用:
df2 = (df
.groupby('Day', sort=False, group_keys=False)
.apply(lambda g: g[g['Direction'].str.contains('|'.join(g.loc[g['Bool'], 'Direction']))])
)
output: output:
Day Direction Bool
0 7 North True
1 7 NorthEast False
2 7 NorthWest False
4 5 East True
5 5 EastWest False
6 5 EastNorth False
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.