簡體   English   中英

python pandas 按不同軸上的總和和平均值分組

[英]python pandas group-by sum and mean on different axis

我需要對我的數據進行分組並在一個軸上計算平均值並在另一個軸上求和。 我一直在尋找類似的問題,但找不到合適的解決方案。

我有一個類似的df:

df = pd.DataFrame ({'A': ['XX','XX','XX','XX','XX','XX','XX','XX','XX',
                          'YY','YY','YY','YY','YY','YY','YY','YY','YY',
                          'ZZ','ZZ','ZZ','ZZ','ZZ','ZZ','ZZ','ZZ','ZZ'],
                    
                    'B': ['ind1','ind2','ind3','ind1','ind2','ind3','ind1','ind2','ind3',
                          'ind1','ind2','ind3','ind1','ind2','ind3','ind1','ind2','ind3',
                          'ind1','ind2','ind3','ind1','ind2','ind3','ind1','ind2','ind3'],   
                                        
                    'C': ['2017','2017','2017','2018','2018','2018','2019','2019','2019',
                          '2017','2017','2017','2018','2018','2018','2019','2019','2019',
                          '2017','2017','2017','2018','2018','2018','2019','2019','2019'],
                    
                    'D': np.random.randint(0,100,size=27)})

我需要以下df:

A   ind1    ind2    ind3    TOTAL
XX  52.33   73.00   37.00   162.33
YY  40.67   51.33   54.33   146.33
ZZ  84.00   28.67   62.00   174.67

其中 ind1、ind2、ind3 列是軸 = 0 的平均值,而 TOTAL 是軸 = 1 的 ind1、ind2、ind3 的總和

我嘗試了以下但不工作:

print(df.groupby('A')['D'].agg(['sum','mean']))

任何幫助都會很棒。

我相信您需要通過crosstabDataFrame.pivot_table進行旋轉,然后添加帶有總和的新列DataFrame.assign

np.random.seed(20)
    
df = pd.DataFrame ({'A': ['XX','XX','XX','XX','XX','XX','XX','XX','XX',
                          'YY','YY','YY','YY','YY','YY','YY','YY','YY',
                          'ZZ','ZZ','ZZ','ZZ','ZZ','ZZ','ZZ','ZZ','ZZ'],
                    
                    'B': ['ind1','ind2','ind3','ind1','ind2','ind3','ind1','ind2','ind3',
                          'ind1','ind2','ind3','ind1','ind2','ind3','ind1','ind2','ind3',
                          'ind1','ind2','ind3','ind1','ind2','ind3','ind1','ind2','ind3'],   
                                        
                    'C': ['2017','2017','2017','2018','2018','2018','2019','2019','2019',
                          '2017','2017','2017','2018','2018','2018','2019','2019','2019',
                          '2017','2017','2017','2018','2018','2018','2019','2019','2019'],
                    
                    'D': np.random.randint(0,100,size=27)})

df = (pd.crosstab(df['A'], df['B'], df['D'], aggfunc='mean')
        .assign(Total = lambda x: x.sum(axis=1)))

print (df)
B        ind1       ind2       ind3       Total
A                                              
XX  67.666667  46.000000  60.000000  173.666667
YY  69.333333  45.666667  67.333333  182.333333
ZZ  16.333333  57.666667  32.333333  106.333333

或者:

df = (df.pivot_table(index='A',columns='B',values='D')
        .assign(Total = lambda x: x.sum(axis=1)))

如果您不熟悉 cross_tab 或 pivot 表,這是另一種方法。

df_n = df.groupby(['A','B'])['D'].mean().unstack()
df_n['Total'] = df_n.sum(axis=1)

Output 將是:

B        ind1       ind2       ind3       Total
A                                              
XX  67.666667  46.000000  60.000000  173.666667
YY  69.333333  45.666667  67.333333  182.333333
ZZ  16.333333  57.666667  32.333333  106.333333

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM