简体   繁体   中英

df.groupby() modification HELP needed

This is my table:

   A  B  C  E
0  1  1  5  4
1  1  1  1  1
2  3  3  8  2

Now, I want to group all rows by Column A and B. Column C should be summed and for column E, I want to use the value where value C is max.

I did the first part of grouping A and B and summing C. I did this with:

df = df.groupby(['A', 'B'])['C'].sum()

But at this point, I am not sure how to tell that column E should take the value where C is max.

The end result should look like this:

   A  B  C  E
0  1  1  6  4
1  3  3  8  2

Can somebody help me with this past piece? Thanks!

Using groupby with agg after sorting by C .

In general, if you are applying different functions to different columns, DataFrameGroupBy.agg allows you to pass a dictionary specifying which operation is applied to each column:

df.sort_values('C').groupby(['A', 'B'], sort=False).agg({'C': 'sum', 'E': 'last'})

     C  E
A B
1 1  6  4
3 3  8  2

By sorting by column C first, and not sorting as part of groupby , we can select the last value of E per group, which will align with the maximum value of C for each group.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM