[英]Add percent of total column to Pandas pivot_table
I can't seem to figure out how to add a % of total column for each date_submitted group to the below pandas pivot table: 我似乎无法弄清楚如何将每个date_submitted组的总列的百分比添加到下面的pandas数据透视表中:
In [177]: pass_rate_pivot
date_submitted audit_status
04-11-2014 audited 140
is_adserver 7
rejected 75
unauditable 257
04-18-2014 audited 177
is_adserver 10
pending 44
rejected 30
unauditable 226
04-25-2014 audited 97
is_adserver 5
pending 33
rejected 9
unauditable 355
Name: site_domain, dtype: int64
In [177]: pass_rate_pivot.to_dict()
Out[177]:
{('04-11-2014', 'audited'): 140,
('04-11-2014', 'is_adserver'): 7,
('04-11-2014', 'rejected'): 75,
('04-11-2014', 'unauditable'): 257,
('04-18-2014', 'audited'): 177,
('04-18-2014', 'is_adserver'): 10,
('04-18-2014', 'pending'): 44,
('04-18-2014', 'rejected'): 30,
('04-18-2014', 'unauditable'): 226,
('04-25-2014', 'audited'): 97,
('04-25-2014', 'is_adserver'): 5,
('04-25-2014', 'pending'): 33,
('04-25-2014', 'rejected'): 9,
('04-25-2014', 'unauditable'): 355}
Is this what you want? 这是你想要的吗? (for each group dividing the element with the sum of all elements in that group):
(对于将元素除以该组中所有元素之和的每个组):
In [62]: pass_rate_pivot.groupby(level=0).transform(lambda x: x/x.sum())
Out[62]:
04-11-2014 audited 0.292276
is_adserver 0.014614
rejected 0.156576
unauditable 0.536534
04-18-2014 audited 0.363450
is_adserver 0.020534
pending 0.090349
rejected 0.061602
unauditable 0.464066
04-25-2014 audited 0.194389
is_adserver 0.010020
pending 0.066132
rejected 0.018036
unauditable 0.711423
dtype: float64
If you want to add this as a column, you can indeed concat
both serieses to one dataframe as suggested by @exp1orer: 如果你想添加为一列,你确实可以
concat
两个个系列到一个数据帧由@建议exp1orer:
pd.concat([pass_rate_pivot,pass_rate_pivot_pct], axis=1)
If pass_rate_pivot
would already be a dataframe, you could just assign a new column like pass_rate_pivot['pct'] = pass_rate_pivot['original column'].groupby(...
如果
pass_rate_pivot
已经是一个数据帧,则可以分配一个新列,例如pass_rate_pivot['pct'] = pass_rate_pivot['original column'].groupby(...
The most natural way is to do it as you create the pivot table. 最自然的方法是在创建数据透视表时执行此操作。 Here I assume that date_submitted is a column (not in the index) using
reset_index
. 在这里,我假设date_submitted是使用
reset_index
的列(不在索引中)。 And make sure that your values are in a column (here I call that 'value_col'). 并确保您的值在列中(在这里我称之为“ value_col”)。 Then
然后
def calc_group_pct(df,value_var = 'value_col'):
df['pct'] = df[value_var]/float(df[value_var].sum())
return df
df.groupby('date_submitted').apply(calc_group_pct)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.