在 groupby 之后計算組中的最小值和最大值之間的差異

Question

根據下面的 df（測試），我計算每組的平均值（'col1'，'col2'）。 之后，我想只使用'col1'執行一個新的groupby，並計算由第一個groupby創建的'mean'列的最小值和最大值之間的差異。

如何以優雅的方式做到這一點？

test=pd.DataFrame({'col1':['B', 'A', 'A', 'B', 'B', 'C', 'C', 'A', 'A', 'B', 'B', 'C', 'C', 'B', 'C', 'C', 'A'],
             'col2':['W', 'L', 'W', 'L', 'W', 'L', 'L', 'L', 'W', 'L', 'W', 'L', 'L', 'W', 'W', 'L', 'L'],
             'value':[32,54,65,24,54,39,76,51,21,4,46,73,59,23, 43,23,12]})

print(test.groupby(['col1', 'col2'])[['value']].agg(
    n=('value', 'count'),
    mean=('value', 'mean')))

Answer 1

您可以使用numpy.ptp方法進行聚合：

(test.groupby(['col1', 'col2'])[['value']]
     .agg(n=('value', 'count'), # this is now useless
          mean=('value', 'mean'))
     .groupby('col1').agg(diff=('mean', np.ptp))
)

替代方案：使用lambda g: g.max()-g.min()作為聚合 function。

Output：

       diff
col1       
A      4.00
B     24.75
C     11.00

在 groupby 之后計算組中的最小值和最大值之間的差異

問題描述

1 個解決方案

解決方案1
2 已采納 2022-08-23 10:00:23

在 groupby 之后計算組中的最小值和最大值之間的差異

問題描述

1 個解決方案

解決方案1 2 已采納 2022-08-23 10:00:23

解決方案1
2 已采納 2022-08-23 10:00:23