df.groupby() modification HELP needed

Question

This is my table:

   A  B  C  E
0  1  1  5  4
1  1  1  1  1
2  3  3  8  2

Now, I want to group all rows by Column A and B. Column C should be summed and for column E, I want to use the value where value C is max.

I did the first part of grouping A and B and summing C. I did this with:

df = df.groupby(['A', 'B'])['C'].sum()

But at this point, I am not sure how to tell that column E should take the value where C is max.

The end result should look like this:

   A  B  C  E
0  1  1  6  4
1  3  3  8  2

Can somebody help me with this past piece? Thanks!

Answer 1

Using groupby with agg after sorting by C .

In general, if you are applying different functions to different columns, DataFrameGroupBy.agg allows you to pass a dictionary specifying which operation is applied to each column:

df.sort_values('C').groupby(['A', 'B'], sort=False).agg({'C': 'sum', 'E': 'last'})

     C  E
A B
1 1  6  4
3 3  8  2

By sorting by column C first, and not sorting as part of groupby , we can select the last value of E per group, which will align with the maximum value of C for each group.

df.groupby() modification HELP needed

Question

1 answers

solution1
4 ACCPTED 2018-08-11 00:25:32

df.groupby() modification HELP needed

Question

1 answers

solution1 4 ACCPTED 2018-08-11 00:25:32

solution1
4 ACCPTED 2018-08-11 00:25:32