Pandas pivot 表格百分比计算

Question

Given the following data frame and pivot table:给定以下数据帧和 pivot 表：

import pandas as pd
df=pd.DataFrame({'A':['x','y','z','x','y','z'],
                 'B':['one','one','one','two','two','two'],
                 'C':[2,18,2,8,2,18]})
df

    A   B       C
0   x   one     2
1   y   one     18
2   z   one     2
3   x   two     8
4   y   two     2
5   z   two     18

table = pd.pivot_table(df, index=['A', 'B'],aggfunc=np.sum)

            C
A   B   
x   one     2
    two     8
y   one     18
    two     2
z   one     2
    two     18

I'd like to add 2 columns to this pivot table;我想在这个 pivot 表中添加 2 列； one showing the percent of all values and another for percent within column A like this:一个显示所有值的百分比，另一个显示 A 列中的百分比，如下所示：

           C    % of Total  % of B
A   B
x   one    2    4%          20%
    two    8    16%         80%
y   one   18    36%         90%
    two    2    4%          10%
z   one    2    4%          10%
    two   18    36%         90%

Extra Credit:额外学分：

I'd like a bottom summary row which has the sum of column C (it's okay if it also has 100% for the next 2 columns, but nothing is needed for those).我想要一个底部汇总行，其中包含 C 列的总和（如果接下来的 2 列也有 100% 也没关系，但那些不需要）。

Answer 1

You can use:您可以使用：

table['% of Total'] = (table.C / table.C.sum() * 100).astype(str) + '%'
table['% of B'] = (table.C / table.groupby(level=0).C.transform(sum) * 100).astype(str) + '%'
print table
        C % of Total % of B
A B                        
x one   2       4.0%  20.0%
  two   8      16.0%  80.0%
y one  18      36.0%  90.0%
  two   2       4.0%  10.0%
z one   2       4.0%  10.0%
  two  18      36.0%  90.0%

But with real data I think casting to int is not recommended, better is use round .但是对于真实数据，我认为不推荐转换为int ，更好的是使用round 。

Extra Credit:额外学分：

table['% of Total'] = (table.C / table.C.sum() * 100)
table['% of B'] = (table.C / table.groupby(level=0).C.transform(sum) * 100)
table.loc['total', :] = table.sum().values
print table
              C  % of Total  % of B
A     B                            
x     one   2.0         4.0    20.0
      two   8.0        16.0    80.0
y     one  18.0        36.0    90.0
      two   2.0         4.0    10.0
z     one   2.0         4.0    10.0
      two  18.0        36.0    90.0
total      50.0       100.0   300.0

Answer 2

If you want to chain the methods to assign the new columns to pivot_table() method to put in a pipeline, you can do so using assign() .如果要链接方法以将新列分配给pivot_table()方法以放入管道中，则可以使用assign()来实现。

Moreover, you can add the totals as a new row using the margins parameter of pivot_table .此外，您可以使用pivot_table的margins参数将总计添加为新行。

table = (
    df
    .pivot_table(index=['A', 'B'], aggfunc=np.sum, margins=True, margins_name='Total')
    .assign(**{
        # must exclude the last row (which are the Totals) for sum and group-specific sum
        '% of Total': lambda x: x['C'] / x.iloc[:-1]['C'].sum() * 100,
        '% of B': lambda x: x['C'] / x.iloc[:-1].groupby(level='A')['C'].transform('sum') * 100
    })
)

Note that for the particular example in the OP, as pivot_table method's columns parameter is not used, pivot_table is equivalent to groupby as explained here .请注意，对于 OP 中的特定示例，由于未使用pivot_table方法的columns参数， pivot_table等效于groupby ，如此处所述。 So an equivalent (and possibly faster) approach to produce the initial pivot table result is因此，产生初始 pivot 表结果的等效（可能更快）方法是

table = df.groupby(['A','B']).sum()

Pandas pivot 表格百分比计算

问题描述

2 个解决方案

解决方案1
30 已采纳 2016-05-10 21:00:15

解决方案2
1 2022-08-13 06:16:33

Pandas pivot 表格百分比计算

问题描述

2 个解决方案

解决方案1 30 已采纳 2016-05-10 21:00:15

解决方案2 1 2022-08-13 06:16:33

解决方案1
30 已采纳 2016-05-10 21:00:15

解决方案2
1 2022-08-13 06:16:33