[英]How to make pivot table in pandas behaves like pivot table in Excel?
I'm trying to transpose the Data and it doesn't matter the aggregation method but the data was grouped by Values instead of Date我正在尝试转置数据,聚合方法无关紧要,但数据按值而不是日期分组
Code:代码:
import pandas as pd
d = {'date': ['2/21/2020', '2/21/2020','2/22/2020','2/22/2020','2/23/2020','2/23/2020'],
'name': ['James','John', 'James','John','James','John'],
'A':[1,2,3,4,5,6],
'B':[7,8,9,10,11,12],
'C':[13,14,15,16,17,18]}
df = pd.DataFrame(data=d)
df = pd.pivot_table (df, index ='name', columns='date', values=['A','B','C'])
df
Output I get:我得到的输出:
What I need我需要的
Note: from Excel the Pivot table input was ('date' as Columns / 'name' as Rows / 'A','B'&'C' as Values)注意:从 Excel 中,数据透视表输入为(“日期”作为列/“名称”作为行/“A”、“B”和“C”作为值)
You'll need to use swaplevel
to switch the order of the column MultiIndex so that date is on top and "A", "B", "C" is on bottom.您需要使用
swaplevel
来切换列 MultiIndex 的顺序,以便日期在顶部,“A”、“B”、“C”在底部。 Then you'll sort that index as well.然后,您还将对该索引进行排序。 To replace "A" with "Sum of A", I used the
rename
method to prefix the columns with "Sum of ".要将“A”替换为“A 之和”,我使用
rename
方法为列添加了“Sum of” 前缀。
new_df = (df.pivot_table(index ='name', columns='date', values=['A','B','C'])
.swaplevel(axis=1)
.sort_index(axis=1)
.rename(columns="Sum of {}".format, level=1)
)
print(new_df)
date 2/21/2020 2/22/2020 2/23/2020
Sum of A Sum of B Sum of C Sum of A Sum of B Sum of C Sum of A Sum of B Sum of C
name
James 1 7 13 3 9 15 5 11 17
John 2 8 14 4 10 16 6 12 18
To get the similar output, we can use margins
, swaplevel
.为了获得类似的输出,我们可以使用
margins
、 swaplevel
。 After that, we can rename the columns with mapper
.之后,我们可以使用
mapper
重命名列。 In the end, .iloc[:, :-3]
is for removing the additional row margins, you can remove if you want to have row margins.最后,
.iloc[:, :-3]
用于删除额外的行边距,如果您想要行边距,可以删除。 : :
df1 = (df.pivot( index=['name'], columns = 'date', margins=True, margins_name='Grand Total', aggfunc=np.sum)
.swaplevel(axis=1)
.sort_index(axis=1)
.rename(mapper=lambda x: f'Sum of {x}',axis=1,level=1)
.iloc[:, :-3])
print(df1)
output:输出:
date 2/21/2020 2/22/2020 2/23/2020
Sum of A Sum of B Sum of C Sum of A Sum of B Sum of C Sum of A Sum of B Sum of C
name
James 1 7 13 3 9 15 5 11 17
John 2 8 14 4 10 16 6 12 18
Grand Total 3 15 27 7 19 31 11 23 35
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.