[英]Pandas groupby one column and keep only groups where column has all values in a set
I have a df as follows:我有一个 df 如下:
foo bar baz
aaa 0 Laos
aaa 45 Nigeria
aaa 123 Panama
bbb 12 Panama
bbb 826 Nigeria
ccc 0 Laos
ccc 15 Laos
ccc 72 Panama
ddd 4 Panama
ddd 9 Laos
ddd 987 Panama
ddd 25 Nigeria
I also have a set: {"laos", "panama", "nigeria"}我也有一套:{"laos", "panama", "nigeria"}
I would like to groupby("foo") and only retain the groups for which column "baz" contains all values in the set.我想 groupby("foo") 并且只保留“baz”列包含集合中所有值的组。
So, the resulting df would contain only those lines (since bbb lacks Laos and ccc lacks Nigeria):因此,生成的 df 将仅包含这些行(因为 bbb 缺少老挝,而 ccc 缺少尼日利亚):
foo bar baz
aaa 0 Laos
aaa 45 Nigeria
aaa 123 Panama
ddd 4 Panama
ddd 9 Laos
ddd 987 Panama
ddd 25 Nigeria
Try with尝试
s=df.groupby('foo').\
filter(lambda x : pd.Series(["laos", "panama", "nigeria"]).isin(x['baz'].str.lower()).all())
Out[21]:
foo bar baz
0 aaa 0 Laos
1 aaa 45 Nigeria
2 aaa 123 Panama
8 ddd 4 Panama
9 ddd 9 Laos
10 ddd 987 Panama
11 ddd 25 Nigeria
IIUC, Series.str.lower
with Series.isin
and GroupBy.transform
IIUC,
Series.str.lower
与Series.isin
和GroupBy.transform
l = ["laos", "panama", "nigeria"]
s = df['baz'].str.lower()
m = (s.isin(l)
.mask(df.duplicated(['baz', 'foo']), False)
.groupby(df['foo'])
.transform('sum').eq(len(l)))
df_filtered = df.loc[m]
print(df_filtered)
foo bar baz
0 aaa 0 Laos
1 aaa 45 Nigeria
2 aaa 123 Panama
8 ddd 4 Panama
9 ddd 9 Laos
10 ddd 987 Panama
11 ddd 25 Nigeria
It is similar to:它类似于:
m = ((s.isin(l) & (~df.duplicated(['baz', 'foo'])))
.groupby(df['foo'])
.transform('sum').eq(len(l)))
df1 = df[df.groupby('foo')['baz'].transform('nunique').eq(3)]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.