[英]python pandas group-by sum and mean on different axis
我需要對我的數據進行分組並在一個軸上計算平均值並在另一個軸上求和。 我一直在尋找類似的問題,但找不到合適的解決方案。
我有一個類似的df:
df = pd.DataFrame ({'A': ['XX','XX','XX','XX','XX','XX','XX','XX','XX',
'YY','YY','YY','YY','YY','YY','YY','YY','YY',
'ZZ','ZZ','ZZ','ZZ','ZZ','ZZ','ZZ','ZZ','ZZ'],
'B': ['ind1','ind2','ind3','ind1','ind2','ind3','ind1','ind2','ind3',
'ind1','ind2','ind3','ind1','ind2','ind3','ind1','ind2','ind3',
'ind1','ind2','ind3','ind1','ind2','ind3','ind1','ind2','ind3'],
'C': ['2017','2017','2017','2018','2018','2018','2019','2019','2019',
'2017','2017','2017','2018','2018','2018','2019','2019','2019',
'2017','2017','2017','2018','2018','2018','2019','2019','2019'],
'D': np.random.randint(0,100,size=27)})
我需要以下df:
A ind1 ind2 ind3 TOTAL
XX 52.33 73.00 37.00 162.33
YY 40.67 51.33 54.33 146.33
ZZ 84.00 28.67 62.00 174.67
其中 ind1、ind2、ind3 列是軸 = 0 的平均值,而 TOTAL 是軸 = 1 的 ind1、ind2、ind3 的總和
我嘗試了以下但不工作:
print(df.groupby('A')['D'].agg(['sum','mean']))
任何幫助都會很棒。
我相信您需要通過crosstab
或DataFrame.pivot_table
進行旋轉,然后添加帶有總和的新列DataFrame.assign
:
np.random.seed(20)
df = pd.DataFrame ({'A': ['XX','XX','XX','XX','XX','XX','XX','XX','XX',
'YY','YY','YY','YY','YY','YY','YY','YY','YY',
'ZZ','ZZ','ZZ','ZZ','ZZ','ZZ','ZZ','ZZ','ZZ'],
'B': ['ind1','ind2','ind3','ind1','ind2','ind3','ind1','ind2','ind3',
'ind1','ind2','ind3','ind1','ind2','ind3','ind1','ind2','ind3',
'ind1','ind2','ind3','ind1','ind2','ind3','ind1','ind2','ind3'],
'C': ['2017','2017','2017','2018','2018','2018','2019','2019','2019',
'2017','2017','2017','2018','2018','2018','2019','2019','2019',
'2017','2017','2017','2018','2018','2018','2019','2019','2019'],
'D': np.random.randint(0,100,size=27)})
df = (pd.crosstab(df['A'], df['B'], df['D'], aggfunc='mean')
.assign(Total = lambda x: x.sum(axis=1)))
print (df)
B ind1 ind2 ind3 Total
A
XX 67.666667 46.000000 60.000000 173.666667
YY 69.333333 45.666667 67.333333 182.333333
ZZ 16.333333 57.666667 32.333333 106.333333
或者:
df = (df.pivot_table(index='A',columns='B',values='D')
.assign(Total = lambda x: x.sum(axis=1)))
如果您不熟悉 cross_tab 或 pivot 表,這是另一種方法。
df_n = df.groupby(['A','B'])['D'].mean().unstack()
df_n['Total'] = df_n.sum(axis=1)
Output 將是:
B ind1 ind2 ind3 Total
A
XX 67.666667 46.000000 60.000000 173.666667
YY 69.333333 45.666667 67.333333 182.333333
ZZ 16.333333 57.666667 32.333333 106.333333
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.