简体   繁体   English

将满足条件的 dataframe 行保留到同一 dataframe 分组的每组中

[英]keep dataframe rows meeting a condition into each group of the same dataframe grouped by

I have the following dataframe.我有以下 dataframe。

    c1  c2  v1  v2
0   a   a   1   2
1   a   a   2   3
2   b   a   3   1
3   b   a   4   5
5   c   d   5   0

I wish to have the following output.我希望有以下 output。

    c1  c2  v1  v2
0   a   a   2   3
1   b   a   4   5
2   c   d   5   0

The rule.规则。 First group dataframe by c1, c2.第一组 dataframe 由 c1、c2 组成。 Then into each group, keep the row with the maximun value in column v2.然后进入每个组,将具有最大值的行保留在列 v2 中。 Finally, output the original dataframe with all the rows not satisfying the previous rule dropped.最后,output 原来的 dataframe 删除了所有不满足前面规则的行。

What is the better way to obtain this result?获得此结果的更好方法是什么? Thanks.谢谢。

Going around, I have found also this solution based on apply method四处走动,我也发现了这个基于应用方法的解决方案

You could use groupby-transform to generate a boolean selection mask : 您可以使用groupby-transform生成布尔选择掩码

grouped = df.groupby(['c1', 'c2'])
mask = grouped['v2'].transform(lambda x: x == x.max()).astype(bool)
df.loc[mask].reset_index(drop=True)

yields 产量

  c1 c2  v1  v2
0  a  a   2   3
1  b  a   4   5
2  c  d   5   0

If you want to make sure that you get one single row per group, you can sort the values by "v2" before grouping and then just take the last row (the one with the highest v2-value).如果您想确保每组只有一行,您可以在分组之前按“v2”对值进行排序,然后只取最后一行(具有最高 v2 值的那一行)。

df = pd.DataFrame({"c1": ["a", "a", "b", "b", "c"], "c2": ["a", "a", "a", "a", "d"], "v1": [1, 2, 3, 4, 5], "v2": [2, 3, 1, 5, 0]})

df.sort_values("v2").groupby(["c1", "c2"]).last().reset_index()

result:

    c1  c2  v1  v2
0   a   a   2   3
1   b   a   4   5
2   c   d   5   0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM