Calculate difference between min and max values in a group after a groupby

Question

Based on the df (test) below I calculate the mean of each group ('col1', 'col2'). After that, I would like to perform a new groupby using only 'col1' and calculate the difference between min and max values of the column 'mean', created by the first groupby.

How can this be done in an elegant way?

test=pd.DataFrame({'col1':['B', 'A', 'A', 'B', 'B', 'C', 'C', 'A', 'A', 'B', 'B', 'C', 'C', 'B', 'C', 'C', 'A'],
             'col2':['W', 'L', 'W', 'L', 'W', 'L', 'L', 'L', 'W', 'L', 'W', 'L', 'L', 'W', 'W', 'L', 'L'],
             'value':[32,54,65,24,54,39,76,51,21,4,46,73,59,23, 43,23,12]})

print(test.groupby(['col1', 'col2'])[['value']].agg(
    n=('value', 'count'),
    mean=('value', 'mean')))

Answer 1

You can aggregate using thenumpy.ptp method:

(test.groupby(['col1', 'col2'])[['value']]
     .agg(n=('value', 'count'), # this is now useless
          mean=('value', 'mean'))
     .groupby('col1').agg(diff=('mean', np.ptp))
)

Alternative: use lambda g: g.max()-g.min() as aggregation function.

Output:

       diff
col1       
A      4.00
B     24.75
C     11.00

Calculate difference between min and max values in a group after a groupby

Question

1 answers

solution1
2 ACCPTED 2022-08-23 10:00:23

Calculate difference between min and max values in a group after a groupby

Question

1 answers

solution1 2 ACCPTED 2022-08-23 10:00:23

solution1
2 ACCPTED 2022-08-23 10:00:23