[英]Python pivot_table - Add difference column
I am new to python.我是 python 的新手。 I have the following data frame.
我有以下数据框。 I am able to pivot in Excel.
我能够在 Excel 中使用 pivot。
I want to add the difference column(in the image, I added it manually).我想添加差异列(在图像中,我手动添加了它)。
The difference is BA value.差值是 BA 值。 I am able to replicate except difference column and Grand Total using Python pivot table.
我可以使用 Python pivot 表复制除差异列和总计之外的数据。 Below is my code.
下面是我的代码。
table = pd.pivot_table(data, index=['Category'], values = ['value'], columns=['Name','Date'], fill_value=0)
How can I add the difference column and calculate the value?如何添加差异列并计算值?
How can I get Grand Total at the bottom?我怎样才能在底部获得总计?
Data as below数据如下
df = pd.DataFrame({
"Value": [0.1, 0.2, 3, 1, -.5, 4],
"Date": ["2020-07-01", "2020-07-01", "2020-07-01", "2020-07-01", "2020-07-01", "2020-07-01"],
"Name": ['A', 'A', 'A', 'B', 'B', 'B'],
"HI Display1": ["X", "Y", "Z", "Z", "Y", "X"]})
I want to the pivot table as below我想要 pivot 表如下
Here's a way to do that:这是一种方法:
df = pd.DataFrame({
"Name": ["A", "A", "A", "B", "B", "B"],
"Date": "2020-07-01",
"Value": [0.1, 0.2, 3, 2, -.5, 4],
"Category": ["Z", "Y", "X", "Z", "Y", "X"]
})
piv = pd.pivot_table(df, index="Category", columns="Name", aggfunc=sum)
piv.columns = [c[1] for c in piv.columns]
piv["diff"] = piv.B - piv.A
The output ( piv
) is: output (
piv
) 是:
A B diff
Category
X 3.0 4.0 1.0
Y 0.2 -0.5 -0.7
Z 0.1 2.0 1.9
To add 'total' for A and B, do要为 A 和 B 添加“总计”,请执行
piv.loc["total"] = piv.sum()
Remove the total from the 'diff' column:从“差异”列中删除总计:
piv.loc["total", "diff"] = "" # or np.NaN, if you'd like to be more
# 'pandas' style.
The output now is: output 现在是:
A B diff
Category
X 3.0 4.0 1.0
Y 0.2 -0.5 -0.7
Z 0.1 2.0 1.9
total 3.3 5.5
If, at this point, you'd like to add the title 'Name' on top of the categories, do:如果此时您想在类别顶部添加标题“名称”,请执行以下操作:
piv.columns = pd.MultiIndex.from_product([["Name"], piv.columns])
piv
is now: piv
现在是:
Name
A B diff
Category
X 3.0 4.0 1.0
Y 0.2 -0.5 -0.7
Z 0.1 2.0 1.9
total 3.3 5.5
To add the date to each column:要将日期添加到每一列:
date = df.Date.max()
piv.columns = pd.MultiIndex.from_tuples([c+(date,) for c in piv.columns])
==>
Name
A B diff
2020-07-01 2020-07-01 2020-07-01
Category
X 3.0 4.0 1
Y 0.2 -0.5 -0.7
Z 0.1 2.0 1.9
total 3.3 5.5
Finally, to color a column (eg if you're using Jupyter), do:最后,为列着色(例如,如果您使用 Jupyter),请执行以下操作:
second_col = piv.columns[2]
piv.style.background_gradient("PiYG", subset = [second_col]).highlight_null('white').set_na_rep("")
Other way to add totals is adding ´margins=True´ argument to pivot function and then replace Total column with difference as this:添加总计的其他方法是将“margins=True”参数添加到 pivot function 然后用差异替换 Total 列,如下所示:
data = {
'Name':['A', 'A' ,'A', 'B', 'B', 'B','A', 'A' ,'A', 'B', 'B', 'B' ],
'Value':[1, 2, 3, 4, 5, 6,1, 2, 3, 4, 5, 6, ],
'Category': ['X', 'Y', 'Z','X', 'Y', 'Z','X', 'Y', 'Z','X', 'Y', 'Z']
}
df = pd.DataFrame(data)
pivot_ = df.pivot_table(index = ["Category"],
columns = "Name" ,
values = "Value",
aggfunc = "sum",
margins=True,
margins_name='Totals')\
.fillna('')
pivot_['Totals'] = pivot_['B'] - pivot_['A']
pivot_.rename(columns={"Totals": "Diff"})
Output: Output:
Name A B Diff
Category
X 2 8 6
Y 4 10 6
Z 6 12 6
Totals 12 30 18
Let's use the sample data you now provided:让我们使用您现在提供的示例数据:
pivot_1 = df_1.pivot_table(index = ["HI Display1"],
columns = ["Name", 'Date'],
values = "Value",
aggfunc = "sum",
margins=True,
margins_name='Totals'
).fillna('')
pivot_1['Totals'] = pivot_1['B'].sum(axis=1) - pivot_1['A'].sum(axis=1)
pivot_1.rename(columns={"Totals": "Diff"})
Output: Output:
Name A B Diff
Date 2020-07-01 2020-07-01
HI Display1
X 0.1 4.0 3.9
Y 0.2 -0.5 -0.7
Z 3.0 1.0 -2.0
Totals 3.3 4.5 1.2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.