[英]Pandas pivot table Percent Calculations
Given the following data frame and pivot table:给定以下数据帧和 pivot 表:
import pandas as pd
df=pd.DataFrame({'A':['x','y','z','x','y','z'],
'B':['one','one','one','two','two','two'],
'C':[2,18,2,8,2,18]})
df
A B C
0 x one 2
1 y one 18
2 z one 2
3 x two 8
4 y two 2
5 z two 18
table = pd.pivot_table(df, index=['A', 'B'],aggfunc=np.sum)
C
A B
x one 2
two 8
y one 18
two 2
z one 2
two 18
I'd like to add 2 columns to this pivot table;我想在这个 pivot 表中添加 2 列; one showing the percent of all values and another for percent within column A like this:
一个显示所有值的百分比,另一个显示 A 列中的百分比,如下所示:
C % of Total % of B
A B
x one 2 4% 20%
two 8 16% 80%
y one 18 36% 90%
two 2 4% 10%
z one 2 4% 10%
two 18 36% 90%
Extra Credit:额外学分:
I'd like a bottom summary row which has the sum of column C (it's okay if it also has 100% for the next 2 columns, but nothing is needed for those).我想要一个底部汇总行,其中包含 C 列的总和(如果接下来的 2 列也有 100% 也没关系,但那些不需要)。
You can use:您可以使用:
table['% of Total'] = (table.C / table.C.sum() * 100).astype(str) + '%'
table['% of B'] = (table.C / table.groupby(level=0).C.transform(sum) * 100).astype(str) + '%'
print table
C % of Total % of B
A B
x one 2 4.0% 20.0%
two 8 16.0% 80.0%
y one 18 36.0% 90.0%
two 2 4.0% 10.0%
z one 2 4.0% 10.0%
two 18 36.0% 90.0%
But with real data I think casting to int
is not recommended, better is use round
.但是对于真实数据,我认为不推荐转换为
int
,更好的是使用round
。
Extra Credit:额外学分:
table['% of Total'] = (table.C / table.C.sum() * 100)
table['% of B'] = (table.C / table.groupby(level=0).C.transform(sum) * 100)
table.loc['total', :] = table.sum().values
print table
C % of Total % of B
A B
x one 2.0 4.0 20.0
two 8.0 16.0 80.0
y one 18.0 36.0 90.0
two 2.0 4.0 10.0
z one 2.0 4.0 10.0
two 18.0 36.0 90.0
total 50.0 100.0 300.0
If you want to chain the methods to assign the new columns to pivot_table()
method to put in a pipeline, you can do so using assign()
.如果要链接方法以将新列分配给
pivot_table()
方法以放入管道中,则可以使用assign()
来实现。
Moreover, you can add the totals as a new row using the margins
parameter of pivot_table
.此外,您可以使用
pivot_table
的margins
参数将总计添加为新行。
table = (
df
.pivot_table(index=['A', 'B'], aggfunc=np.sum, margins=True, margins_name='Total')
.assign(**{
# must exclude the last row (which are the Totals) for sum and group-specific sum
'% of Total': lambda x: x['C'] / x.iloc[:-1]['C'].sum() * 100,
'% of B': lambda x: x['C'] / x.iloc[:-1].groupby(level='A')['C'].transform('sum') * 100
})
)
Note that for the particular example in the OP, as pivot_table
method's columns
parameter is not used, pivot_table
is equivalent to groupby
as explained here .请注意,对于 OP 中的特定示例,由于未使用
pivot_table
方法的columns
参数, pivot_table
等效于groupby
,如此处所述。 So an equivalent (and possibly faster) approach to produce the initial pivot table result is因此,产生初始 pivot 表结果的等效(可能更快)方法是
table = df.groupby(['A','B']).sum()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.