[英]How to change a multiindex into columns in Panda
I am trying to aggregate sales data using Pandas.我正在尝试使用 Pandas 汇总销售数据。 Each line of the input file has a date, sales, category and date where there can be multiple entries for a category for a date.
输入文件的每一行都有一个日期、销售额、类别和日期,其中一个日期的类别可以有多个条目。
import pandas as pd
from datetime import date
df = pd.DataFrame( [
{ 'Date': date(2022,4,1), 'Category': 'Food', 'Amount': 11.0 },
{ 'Date': date(2022,4,1), 'Category': 'Soda', 'Amount': 3.0 },
{ 'Date': date(2022,4,1), 'Category': 'Soda', 'Amount': 2.0 },
{ 'Date': date(2022,4,1), 'Category': 'Food', 'Amount': 13.0 },
{ 'Date': date(2022,4,2), 'Category': 'Candy', 'Amount': 1.0 },
{ 'Date': date(2022,4,2), 'Category': 'Candy', 'Amount': 0.5 },
{ 'Date': date(2022,4,2), 'Category': 'Food', 'Amount': 15.0 },
{ 'Date': date(2022,4,2), 'Category': 'Soda', 'Amount': 2.0 },
{ 'Date': date(2022,4,2), 'Category': 'Soda', 'Amount': 1.0 },
{ 'Date': date(2022,4,3), 'Category': 'Candy', 'Amount': 2.0 },
{ 'Date': date(2022,4,3), 'Category': 'Food', 'Amount': 18.0 },
{ 'Date': date(2022,4,3), 'Category': 'Food', 'Amount': 11.0 },
] )
I can use groupby to sum the entries for a category and I end up with a multi-index on Date and Category:我可以使用 groupby 来总结一个类别的条目,我最终得到一个关于日期和类别的多索引:
b = df.groupby(['Date', 'Category']).sum()
print(b)
Amount
Date Category
2022-04-01 Food 24.00
Soda 5.00
2022-04-02 Candy 1.75
Food 15.00
Soda 5.00
2022-04-03 Candy 0.60
Food 11.00
Soda 3.00
How can I transform this so the different categories are columns with the date as the index, something like this:如何转换它,使不同的类别成为以日期为索引的列,如下所示:
Food Soda Candy
2022-04-01 24.0 5.0 0.0
2022-04-02 15.0 5.0 1.75
2022-04-03 11.0 3.0 0.6
I've tried pivot tables, crosstabs (xs) and unstacking and can't figure out the right Pandas commands to get there!我已经尝试过 pivot 表、交叉表 (xs) 和取消堆叠,但无法找出正确的 Pandas 命令来到达那里!
import numpy as np
out = (pd.crosstab(df['Date'], df['Category'], df['Amount'], aggfunc=np.sum)
.fillna(0)
)
Output: Output:
Category Candy Food Soda
Date
2022-04-01 0.0 24.0 5.0
2022-04-02 1.5 15.0 3.0
2022-04-03 2.0 29.0 0.0
Modification of your method with unstack
:使用
unstack
修改您的方法:
out = (df.groupby(['Date', 'Category'])['Amount'].sum()
.unstack(fill_value=0)
)
You can use pd.pivot_table
with sum
as aggfunc
您可以使用
pd.pivot_table
和sum
作为aggfunc
df.pivot_table(index='Date', columns='Category', values='Amount', aggfunc='sum', fill_value=0)
Output: Output:
Category Candy Food Soda
Date
2022-04-01 0.0 24 5
2022-04-02 1.5 15 3
2022-04-03 2.0 29 0
Because you mentioned next to crosstab (mozway's answer) and pivot_table also unstack
, here a way you could do it with that:因为您在 crosstab (mozway 的答案)和 pivot_table 旁边提到了
unstack
,所以您可以通过以下方式做到这一点:
df.set_index(['Date', 'Category'],append=True).unstack().groupby('Date').sum()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.