简体   繁体   English

Python pivot_table - 添加差异列

[英]Python pivot_table - Add difference column

I am new to python.我是 python 的新手。 I have the following data frame.我有以下数据框。 I am able to pivot in Excel.我能够在 Excel 中使用 pivot。

I want to add the difference column(in the image, I added it manually).我想添加差异列(在图像中,我手动添加了它)。

The difference is BA value.差值是 BA 值。 I am able to replicate except difference column and Grand Total using Python pivot table.我可以使用 Python pivot 表复制除差异列和总计之外的数据。 Below is my code.下面是我的代码。

table = pd.pivot_table(data, index=['Category'], values = ['value'], columns=['Name','Date'], fill_value=0)

How can I add the difference column and calculate the value?如何添加差异列并计算值?

How can I get Grand Total at the bottom?我怎样才能在底部获得总计?

Data as below数据如下

df = pd.DataFrame({
"Value": [0.1, 0.2, 3, 1, -.5, 4],
"Date": ["2020-07-01", "2020-07-01", "2020-07-01", "2020-07-01", "2020-07-01", "2020-07-01"],
"Name": ['A', 'A', 'A', 'B', 'B', 'B'],
"HI Display1": ["X", "Y", "Z", "Z", "Y", "X"]})

I want to the pivot table as below我想要 pivot 表如下

数据透视表

Here's a way to do that:这是一种方法:

df = pd.DataFrame({
    "Name": ["A", "A", "A", "B", "B", "B"], 
    "Date": "2020-07-01", 
    "Value": [0.1, 0.2, 3, 2, -.5, 4], 
    "Category": ["Z", "Y", "X", "Z", "Y", "X"]
})

piv = pd.pivot_table(df, index="Category", columns="Name", aggfunc=sum)
piv.columns = [c[1] for c in piv.columns]
piv["diff"] = piv.B - piv.A

The output ( piv ) is: output ( piv ) 是:

            A    B  diff
Category                
X         3.0  4.0   1.0
Y         0.2 -0.5  -0.7
Z         0.1  2.0   1.9

To add 'total' for A and B, do要为 A 和 B 添加“总计”,请执行

piv.loc["total"] = piv.sum()

Remove the total from the 'diff' column:从“差异”列中删除总计:

piv.loc["total", "diff"] = "" # or np.NaN, if you'd like to be more 
                              # 'pandas' style. 

The output now is: output 现在是:

            A    B  diff
Category                
X         3.0  4.0   1.0
Y         0.2 -0.5  -0.7
Z         0.1  2.0   1.9
total     3.3  5.5   

If, at this point, you'd like to add the title 'Name' on top of the categories, do:如果此时您想在类别顶部添加标题“名称”,请执行以下操作:

piv.columns = pd.MultiIndex.from_product([["Name"], piv.columns])

piv is now: piv现在是:

         Name          
            A    B diff
Category               
X         3.0  4.0  1.0
Y         0.2 -0.5 -0.7
Z         0.1  2.0  1.9
total     3.3  5.5  

To add the date to each column:要将日期添加到每一列:

date = df.Date.max()
piv.columns = pd.MultiIndex.from_tuples([c+(date,) for c in piv.columns])

==>
               Name                      
                  A          B       diff
         2020-07-01 2020-07-01 2020-07-01
Category                                 
X               3.0        4.0          1
Y               0.2       -0.5       -0.7
Z               0.1        2.0        1.9
total           3.3        5.5           

Finally, to color a column (eg if you're using Jupyter), do:最后,为列着色(例如,如果您使用 Jupyter),请执行以下操作:

second_col = piv.columns[2]
piv.style.background_gradient("PiYG", subset = [second_col]).highlight_null('white').set_na_rep("")

在此处输入图像描述

Other way to add totals is adding ´margins=True´ argument to pivot function and then replace Total column with difference as this:添加总计的其他方法是将“margins=True”参数添加到 pivot function 然后用差异替换 Total 列,如下所示:

data = {
        'Name':['A', 'A' ,'A', 'B', 'B', 'B','A', 'A' ,'A', 'B', 'B', 'B' ],
        'Value':[1, 2, 3, 4, 5, 6,1, 2, 3, 4, 5, 6, ],
        'Category': ['X', 'Y', 'Z','X', 'Y', 'Z','X', 'Y', 'Z','X', 'Y', 'Z']
    }

df = pd.DataFrame(data)

pivot_ = df.pivot_table(index = ["Category"], 
              columns = "Name" , 
              values = "Value", 
              aggfunc = "sum", 
              margins=True, 
              margins_name='Totals')\
 .fillna('')

pivot_['Totals'] = pivot_['B'] - pivot_['A']

pivot_.rename(columns={"Totals": "Diff"})

Output: Output:

Name    A   B   Diff
Category            
X       2   8   6
Y       4   10  6
Z       6   12  6
Totals  12  30  18

EDIT BASED ON QUESTION UPDATE:根据问题更新进行编辑:

Let's use the sample data you now provided:让我们使用您现在提供的示例数据:

pivot_1 = df_1.pivot_table(index = ["HI Display1"], 
              columns = ["Name", 'Date'], 
              values = "Value", 
              aggfunc = "sum", 
              margins=True, 
              margins_name='Totals'
).fillna('')

pivot_1['Totals'] = pivot_1['B'].sum(axis=1) - pivot_1['A'].sum(axis=1)

pivot_1.rename(columns={"Totals": "Diff"})

Output: Output:

Name        A           B           Diff
Date        2020-07-01  2020-07-01  
HI Display1         
X           0.1         4.0         3.9
Y           0.2         -0.5        -0.7
Z           3.0         1.0         -2.0
Totals      3.3         4.5         1.2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM