[英]How to use groupby in pandas to calculate a percentage / proportion total based on a criteria in another column
[英]Pandas GroupBy two columns, calculate the total based on one column but calculate the percentage based on the total for the agregator
我已经导出了想要的分组,但是想根据每月的总数来计算百分比列,即与originating_system_id中的字符串无关
d = [('Total_RFQ_For_Month', 'size')]
df_RFQ_Channel = df.groupby(['Year_Month','originating_system_id'])['state'].agg(d)
#df_RFQ_Channel['RFQ_Pcent_For_Month'] = ?
display(df_RFQ_Channel)
Year_Month originating_system_id Total_RFQ_For_Month RFQ_Pcent_For_Month
2017-11 BBT 59 7.90%
EUCR 33 4.42%
MAXL 6 0.80%
MXUS 649 86.88%
2017-12 BBT 36 73.47%
EUCR 7 14.29%
MAXL 6 12.24%
2018-01 BBT 88 9.52%
EUCR 26 2.81%
MAXL 4 0.43%
MXUS 800 86.58%
VOIX 6 0.65%
例:
7.90% is BBT's Total_RFQ_For_Month (59) divided by the sum of all for 2017-11 (747)
2.81% is EUCR's Total_RFQ_For_Month (26) divided by the sum of all for 2018-01 (924).
使用transform
为Series
具有相同尺寸的原始DataFrame
,所以是可能的鸿沟Total_RFQ_For_Month
列:
#create columns from MultiIndex
df = df.reset_index()
s = df.groupby('Year_Month')['Total_RFQ_For_Month'].transform('sum')
df['RFQ_Pcent_For_Month'] = df['Total_RFQ_For_Month'].div(s).mul(100).round(2)
print (df)
Year_Month originating_system_id Total_RFQ_For_Month RFQ_Pcent_For_Month
0 2017-11 BBT 59 7.90
1 2017-11 EUCR 33 4.42
2 2017-11 MAXL 6 0.80
3 2017-11 MXUS 649 86.88
4 2017-12 BBT 36 73.47
5 2017-12 EUCR 7 14.29
6 2017-12 MAXL 6 12.24
7 2018-01 BBT 88 9.52
8 2018-01 EUCR 26 2.81
9 2018-01 MAXL 4 0.43
10 2018-01 MXUS 800 86.58
11 2018-01 VOIX 6 0.65
对于百分比:
df['RFQ_Pcent_For_Month'] = (df['Total_RFQ_For_Month'].div(s)
.mul(100)
.round(2)
.astype(str)
.add('%'))
print (df)
Year_Month originating_system_id Total_RFQ_For_Month RFQ_Pcent_For_Month
0 2017-11 BBT 59 7.9%
1 2017-11 EUCR 33 4.42%
2 2017-11 MAXL 6 0.8%
3 2017-11 MXUS 649 86.88%
4 2017-12 BBT 36 73.47%
5 2017-12 EUCR 7 14.29%
6 2017-12 MAXL 6 12.24%
7 2018-01 BBT 88 9.52%
8 2018-01 EUCR 26 2.81%
9 2018-01 MAXL 4 0.43%
10 2018-01 MXUS 800 86.58%
11 2018-01 VOIX 6 0.65%
详细说明 :
print (s)
0 747
1 747
2 747
3 747
4 49
5 49
6 49
7 924
8 924
9 924
10 924
11 924
Name: Total_RFQ_For_Month, dtype: int64
重新创建df的步骤:
df = pd.DataFrame(columns=['Year_Month', 'originating_system_id', 'Total_RFQ_For_Month'])
# only two months
df.loc[0]=['2017-11','BBT',59]
df.loc[1]=['2017-11','EUCR',33]
df.loc[2]=['2017-11','MAXL',6]
df.loc[3]=['2017-11','MXUS',649]
df.loc[4]=['2017-12','BBT',36]
df.loc[5]=['2017-12','EUCR',7]
df.loc[6]=['2017-12','MAXL',88]
# Same as your DF
gp1 = df.groupby(['Year_Month','originating_system_id']).sum()
gp2=gp1.reset_index()
gp3 = df[['Year_Month','Total_RFQ_For_Month']].groupby(['Year_Month']).sum().rename(columns={'Total_RFQ_For_Month':
'RFQ_For_Month_Sum'})
gp2=gp2.merge(gp3, on='Year_Month')
gp2['RFQ_Pcent_For_Month']=((gp2['Total_RFQ_For_Month']*100)/gp2['RFQ_For_Month_Sum']).round(3).astype(str).add('%')
gp2.drop(['RFQ_For_Month_Sum'],1,inplace=True)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.