简体   繁体   English

如何将多索引更改为 Panda 中的列

[英]How to change a multiindex into columns in Panda

I am trying to aggregate sales data using Pandas.我正在尝试使用 Pandas 汇总销售数据。 Each line of the input file has a date, sales, category and date where there can be multiple entries for a category for a date.输入文件的每一行都有一个日期、销售额、类别和日期,其中一个日期的类别可以有多个条目。

import pandas as pd
from datetime import date

df = pd.DataFrame( [
    { 'Date': date(2022,4,1), 'Category': 'Food', 'Amount': 11.0 },
    { 'Date': date(2022,4,1), 'Category': 'Soda', 'Amount': 3.0 },
    { 'Date': date(2022,4,1), 'Category': 'Soda', 'Amount': 2.0 },
    { 'Date': date(2022,4,1), 'Category': 'Food', 'Amount': 13.0 },

    { 'Date': date(2022,4,2), 'Category': 'Candy', 'Amount': 1.0 },
    { 'Date': date(2022,4,2), 'Category': 'Candy', 'Amount': 0.5 },
    { 'Date': date(2022,4,2), 'Category': 'Food', 'Amount': 15.0 },
    { 'Date': date(2022,4,2), 'Category': 'Soda', 'Amount': 2.0 },
    { 'Date': date(2022,4,2), 'Category': 'Soda', 'Amount': 1.0 },

    { 'Date': date(2022,4,3), 'Category': 'Candy', 'Amount': 2.0 },
    { 'Date': date(2022,4,3), 'Category': 'Food', 'Amount': 18.0 },
    { 'Date': date(2022,4,3), 'Category': 'Food', 'Amount': 11.0 },
] )

I can use groupby to sum the entries for a category and I end up with a multi-index on Date and Category:我可以使用 groupby 来总结一个类别的条目,我最终得到一个关于日期和类别的多索引:

b = df.groupby(['Date', 'Category']).sum()
print(b)
                     Amount
Date       Category        
2022-04-01 Food       24.00
           Soda        5.00
2022-04-02 Candy       1.75
           Food       15.00
           Soda        5.00
2022-04-03 Candy       0.60
           Food       11.00
           Soda        3.00

How can I transform this so the different categories are columns with the date as the index, something like this:如何转换它,使不同的类别成为以日期为索引的列,如下所示:

            Food  Soda  Candy
2022-04-01  24.0  5.0   0.0
2022-04-02  15.0  5.0   1.75
2022-04-03  11.0  3.0   0.6

I've tried pivot tables, crosstabs (xs) and unstacking and can't figure out the right Pandas commands to get there!我已经尝试过 pivot 表、交叉表 (xs) 和取消堆叠,但无法找出正确的 Pandas 命令来到达那里!

Using crosstab :使用crosstab

import numpy as np

out = (pd.crosstab(df['Date'], df['Category'], df['Amount'], aggfunc=np.sum)
         .fillna(0)
       )

Output: Output:

Category    Candy  Food  Soda
Date                         
2022-04-01    0.0  24.0   5.0
2022-04-02    1.5  15.0   3.0
2022-04-03    2.0  29.0   0.0

Modification of your method with unstack :使用unstack修改您的方法:

out = (df.groupby(['Date', 'Category'])['Amount'].sum()
         .unstack(fill_value=0)
       )

You can use pd.pivot_table with sum as aggfunc您可以使用pd.pivot_tablesum作为aggfunc

df.pivot_table(index='Date', columns='Category', values='Amount', aggfunc='sum', fill_value=0)

Output: Output:

Category    Candy  Food  Soda
Date                         
2022-04-01    0.0    24     5
2022-04-02    1.5    15     3
2022-04-03    2.0    29     0

Because you mentioned next to crosstab (mozway's answer) and pivot_table also unstack , here a way you could do it with that:因为您在 crosstab (mozway 的答案)和 pivot_table 旁边提到了unstack ,所以您可以通过以下方式做到这一点:

df.set_index(['Date', 'Category'],append=True).unstack().groupby('Date').sum()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM