根据字符串和 boolean 列从 ogrouped 数据帧中过滤行

Question

我有以下数据框：

data = {
      'Day':[7,7,7,7,5,5,5,5],
     'Direction': ["North","NorthEast","NorthWest","West","East","EastWest","EastNorth","West"],
    'Bool':[True,False,False,False,True,False,False,False],}

df = pd.DataFrame(data)
df.groupby(["Day"])

      Day  Direction   Bool
  0    7      North   True  
  1    7  NorthEast  False
  2    7  NorthWest  False
  3    7       West  False
  4    5       East   True
  5    5   EastWest  False
  6    5  EastNorth  False
  7    5       West  False

我想按天过滤每个组，字符串列df['Direction']不包含在df['Direction']的行中的行，其中df['Bool']是True 。

因此，例如在第一组中， df['Direction']= "West"它与df["direction"]= "North" （其中df["Bool"]== True ）不匹配，因此它被丢弃了。 df["Direction"]="NorthWest"是一个匹配项，因为字符串包含North所以它被保留。

预期 Output：

      Day  Direction   Bool
  0    7      North   True  
  1    7  NorthEast  False
  2    7  NorthWest  False
  3    5       East   True
  4    5   EastWest  False
  5    5  EastNorth  False

行并不总是具有相同的顺序，因此不能使用shift() 。 我想知道是否有一种快速的方法可以在不使用循环的情况下做到这一点。

Answer 1

IIUC，您可以将groupby.apply与 boolean 切片一起使用：

df2 = (df
   .groupby('Day', sort=False, group_keys=False)
   .apply(lambda g: g[g['Direction'].str.contains('|'.join(g.loc[g['Bool'], 'Direction']))])
)

output：

   Day  Direction   Bool
0    7      North   True
1    7  NorthEast  False
2    7  NorthWest  False
4    5       East   True
5    5   EastWest  False
6    5  EastNorth  False

根据字符串和 boolean 列从 ogrouped 数据帧中过滤行

问题描述

1 个解决方案

解决方案1
1 已采纳 2022-08-17 09:15:10

根据字符串和 boolean 列从 ogrouped 数据帧中过滤行

问题描述

1 个解决方案

解决方案1 1 已采纳 2022-08-17 09:15:10

解决方案1
1 已采纳 2022-08-17 09:15:10