简体   繁体   中英

Calculate difference between min and max values in a group after a groupby

Based on the df (test) below I calculate the mean of each group ('col1', 'col2'). After that, I would like to perform a new groupby using only 'col1' and calculate the difference between min and max values of the column 'mean', created by the first groupby.

How can this be done in an elegant way?

test=pd.DataFrame({'col1':['B', 'A', 'A', 'B', 'B', 'C', 'C', 'A', 'A', 'B', 'B', 'C', 'C', 'B', 'C', 'C', 'A'],
             'col2':['W', 'L', 'W', 'L', 'W', 'L', 'L', 'L', 'W', 'L', 'W', 'L', 'L', 'W', 'W', 'L', 'L'],
             'value':[32,54,65,24,54,39,76,51,21,4,46,73,59,23, 43,23,12]})

print(test.groupby(['col1', 'col2'])[['value']].agg(
    n=('value', 'count'),
    mean=('value', 'mean')))

You can aggregate using thenumpy.ptp method:

(test.groupby(['col1', 'col2'])[['value']]
     .agg(n=('value', 'count'), # this is now useless
          mean=('value', 'mean'))
     .groupby('col1').agg(diff=('mean', np.ptp))
)

Alternative: use lambda g: g.max()-g.min() as aggregation function.

Output:

       diff
col1       
A      4.00
B     24.75
C     11.00

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM