简体   繁体   English

Pandas 按一列分组并仅保留列具有集合中所有值的组

[英]Pandas groupby one column and keep only groups where column has all values in a set

I have a df as follows:我有一个 df 如下:

foo bar baz
aaa 0   Laos
aaa 45  Nigeria
aaa 123 Panama
bbb 12  Panama
bbb 826 Nigeria
ccc 0   Laos
ccc 15  Laos
ccc 72  Panama
ddd 4   Panama
ddd 9   Laos
ddd 987 Panama
ddd 25  Nigeria

I also have a set: {"laos", "panama", "nigeria"}我也有一套:{"laos", "panama", "nigeria"}

I would like to groupby("foo") and only retain the groups for which column "baz" contains all values in the set.我想 groupby("foo") 并且只保留“baz”列包含集合中所有值的组。

So, the resulting df would contain only those lines (since bbb lacks Laos and ccc lacks Nigeria):因此,生成的 df 将仅包含这些行(因为 bbb 缺少老挝,而 ccc 缺少尼日利亚):

foo bar baz
aaa 0   Laos
aaa 45  Nigeria
aaa 123 Panama
ddd 4   Panama
ddd 9   Laos
ddd 987 Panama
ddd 25  Nigeria

Try with尝试

s=df.groupby('foo').\
      filter(lambda x : pd.Series(["laos", "panama", "nigeria"]).isin(x['baz'].str.lower()).all())
Out[21]: 
    foo  bar      baz
0   aaa    0     Laos
1   aaa   45  Nigeria
2   aaa  123   Panama
8   ddd    4   Panama
9   ddd    9     Laos
10  ddd  987   Panama
11  ddd   25  Nigeria

IIUC, Series.str.lower with Series.isin and GroupBy.transform IIUC, Series.str.lowerSeries.isinGroupBy.transform

l = ["laos", "panama", "nigeria"]
s = df['baz'].str.lower()

m = (s.isin(l)
      .mask(df.duplicated(['baz', 'foo']), False)
      .groupby(df['foo'])
      .transform('sum').eq(len(l)))

df_filtered = df.loc[m]
print(df_filtered)


    foo  bar      baz
0   aaa    0     Laos
1   aaa   45  Nigeria
2   aaa  123   Panama
8   ddd    4   Panama
9   ddd    9     Laos
10  ddd  987   Panama
11  ddd   25  Nigeria

It is similar to:它类似于:

m = ((s.isin(l) & (~df.duplicated(['baz', 'foo'])))
       .groupby(df['foo'])
       .transform('sum').eq(len(l)))
df1 = df[df.groupby('foo')['baz'].transform('nunique').eq(3)]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM