[英]keep dataframe rows meeting a condition into each group of the same dataframe grouped by
I have the following dataframe.我有以下 dataframe。
c1 c2 v1 v2
0 a a 1 2
1 a a 2 3
2 b a 3 1
3 b a 4 5
5 c d 5 0
I wish to have the following output.我希望有以下 output。
c1 c2 v1 v2
0 a a 2 3
1 b a 4 5
2 c d 5 0
The rule.规则。 First group dataframe by c1, c2.
第一组 dataframe 由 c1、c2 组成。 Then into each group, keep the row with the maximun value in column v2.
然后进入每个组,将具有最大值的行保留在列 v2 中。 Finally, output the original dataframe with all the rows not satisfying the previous rule dropped.
最后,output 原来的 dataframe 删除了所有不满足前面规则的行。
What is the better way to obtain this result?获得此结果的更好方法是什么? Thanks.
谢谢。
Going around, I have found also this solution based on apply method四处走动,我也发现了这个基于应用方法的解决方案
You could use groupby-transform
to generate a boolean selection mask : 您可以使用
groupby-transform
生成布尔选择掩码 :
grouped = df.groupby(['c1', 'c2'])
mask = grouped['v2'].transform(lambda x: x == x.max()).astype(bool)
df.loc[mask].reset_index(drop=True)
yields 产量
c1 c2 v1 v2
0 a a 2 3
1 b a 4 5
2 c d 5 0
If you want to make sure that you get one single row per group, you can sort the values by "v2" before grouping and then just take the last row (the one with the highest v2-value).如果您想确保每组只有一行,您可以在分组之前按“v2”对值进行排序,然后只取最后一行(具有最高 v2 值的那一行)。
df = pd.DataFrame({"c1": ["a", "a", "b", "b", "c"], "c2": ["a", "a", "a", "a", "d"], "v1": [1, 2, 3, 4, 5], "v2": [2, 3, 1, 5, 0]})
df.sort_values("v2").groupby(["c1", "c2"]).last().reset_index()
result:
c1 c2 v1 v2
0 a a 2 3
1 b a 4 5
2 c d 5 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.