[英]Count number of times a level occurs within a cluster/group in Python dataframe
[英]Remove group from the pandas dataframe when a specific value within the group occurs at least two times
我有一个如下所示的 Pandas 数据框:
import pandas as pd
d = {'group': ['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B', 'C', 'C', 'C', 'C', 'C'],
'fruit': ['strawberry', 'strawberry', 'strawberry', 'strawberry', 'strawberry', 'kiwi', 'banana', 'strawberry', 'orange', 'kiwi', 'banana', 'melon', 'orange', 'kiwi', 'melon']}
df = pd.DataFrame(data=d)
df
group fruit
A strawberry
A strawberry
A strawberry
A strawberry
A strawberry
B kiwi
B banana
B strawberry
B orange
B kiwi
C banana
C melon
C orange
C kiwi
C melon
如果该组包含任何带有fruit
值“草莓”的行,则使用下面的代码我可以删除该group
。
df[~df.fruit.eq('strawberry').groupby(df.group).transform('any')]
group fruit
C banana
C melon
C orange
C kiwi
C melon
但是,我只想删除一个group
如果该组至少包含两行fruit value
“草莓”,这意味着最终结果应该也包括 B 组。我该如何实现?
更改为transform('sum')
n = 2
out = df[~(df.fruit.eq('strawberry').groupby(df.group).transform('sum')>=n)]
out
Out[108]:
group fruit
5 B kiwi
6 B banana
7 B strawberry
8 B orange
9 B kiwi
10 C banana
11 C melon
12 C orange
13 C kiwi
14 C melon
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.