简体   繁体   English

Python-计算数据透视表中总计的百分比

[英]Python - Calculating Percent of Grand Total in Pivot Tables

I have a dataframe that I converted to a pivot table using pd.pivot_table method and a sum aggregate function: 我有一个数据框,已使用pd.pivot_table方法和sum汇总函数将其转换为数据透视表:

summary = pd.pivot_table(df, 
                         index=["Region"], 
                         columns=["Product"], 
                         values=['Price'],
                         aggfunc=[np.sum],
                         fill_value=0,
                         margins=True,
                         margins_name="Total"
                        )

I have received an output like this: 我收到了这样的输出:

样本数据透视表

I would like to add another pivot table that displays percent of grand total calculated in the previous pivot table for each of the categories. 我想添加另一个数据透视表,显示每个类别在上一个数据透视表中计算出的总计的百分比。 All these should add up to 100% and should look like this. 所有这些加起来应该是100%,看起来应该像这样。

枢纽分析表,占总数的百分比

I have tried the following workaround that I found on stackoverflow: 我尝试了以下在stackoverflow上找到的解决方法:

total = df['Price'].sum()

table = pd.pivot_table(DF, 
                       index=["Region"],
                       columns=["Product"], 
                       values=['Price'],
                       aggfunc=[np.sum, 
                                (lambda x: sum(x)/total*100)
                               ],
                       fill_value=0,
                       margins=True,
                       margins_name="Total"
                      )

This calculated the percentages but they only add up to 85%... 这算出了百分比,但它们总共只占85%...

It'd be great to not have to calculate the total outside of the pivot tabe and just be able to call the Grand Total from the first pivot. 不必在枢轴表之外计算总数,而能够从第一个枢轴调用总计就好了。 But even if I have to calculate separately, like in the code above, as long as it adds up to 100% it would still be great. 但是,即使我必须像上面的代码中那样单独计算,只要它加起来等于100%,它仍然会很棒。

Thank you in advance! 先感谢您!

This can be done very easily: 这很容易做到:

    import numpy as np
    import pandas as pd

    # Create table
    table_1 = np.matrix([[100, 200, 650, 950],
                         [200, 250, 350, 800],
                         [400, 500, 200, 200],
                         [700, 950, 1200, 2850]])

    column_labels = ['A', 'B', 'C', 'Region Total']
    idx_labels = ['Region 1', 'Region 2', 'Region 3', 'Product Total']

    df = pd.DataFrame(table_1)
    df.columns = column_labels
    df.index = idx_labels
    df.index.name = 'Sales'

    # Create percentage table
    df_percentage = np.round(df*100/df.iloc[-1, -1], 1)

    print(df_percentage)

                      A     B     C  Region Total
    Sales                                        
    Region 1        3.5   7.0  22.8          33.3
    Region 2        7.0   8.8  12.3          28.1
    Region 3       14.0  17.5   7.0           7.0
    Product Total  24.6  33.3  42.1         100.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM