如何将多索引更改为 Panda 中的列

Question

I am trying to aggregate sales data using Pandas.我正在尝试使用 Pandas 汇总销售数据。 Each line of the input file has a date, sales, category and date where there can be multiple entries for a category for a date.输入文件的每一行都有一个日期、销售额、类别和日期，其中一个日期的类别可以有多个条目。

import pandas as pd
from datetime import date

df = pd.DataFrame( [
    { 'Date': date(2022,4,1), 'Category': 'Food', 'Amount': 11.0 },
    { 'Date': date(2022,4,1), 'Category': 'Soda', 'Amount': 3.0 },
    { 'Date': date(2022,4,1), 'Category': 'Soda', 'Amount': 2.0 },
    { 'Date': date(2022,4,1), 'Category': 'Food', 'Amount': 13.0 },

    { 'Date': date(2022,4,2), 'Category': 'Candy', 'Amount': 1.0 },
    { 'Date': date(2022,4,2), 'Category': 'Candy', 'Amount': 0.5 },
    { 'Date': date(2022,4,2), 'Category': 'Food', 'Amount': 15.0 },
    { 'Date': date(2022,4,2), 'Category': 'Soda', 'Amount': 2.0 },
    { 'Date': date(2022,4,2), 'Category': 'Soda', 'Amount': 1.0 },

    { 'Date': date(2022,4,3), 'Category': 'Candy', 'Amount': 2.0 },
    { 'Date': date(2022,4,3), 'Category': 'Food', 'Amount': 18.0 },
    { 'Date': date(2022,4,3), 'Category': 'Food', 'Amount': 11.0 },
] )

I can use groupby to sum the entries for a category and I end up with a multi-index on Date and Category:我可以使用 groupby 来总结一个类别的条目，我最终得到一个关于日期和类别的多索引：

b = df.groupby(['Date', 'Category']).sum()
print(b)

                     Amount
Date       Category        
2022-04-01 Food       24.00
           Soda        5.00
2022-04-02 Candy       1.75
           Food       15.00
           Soda        5.00
2022-04-03 Candy       0.60
           Food       11.00
           Soda        3.00

How can I transform this so the different categories are columns with the date as the index, something like this:如何转换它，使不同的类别成为以日期为索引的列，如下所示：

            Food  Soda  Candy
2022-04-01  24.0  5.0   0.0
2022-04-02  15.0  5.0   1.75
2022-04-03  11.0  3.0   0.6

I've tried pivot tables, crosstabs (xs) and unstacking and can't figure out the right Pandas commands to get there!我已经尝试过 pivot 表、交叉表 (xs) 和取消堆叠，但无法找出正确的 Pandas 命令来到达那里！

Answer 1

Using crosstab :使用crosstab ：

import numpy as np

out = (pd.crosstab(df['Date'], df['Category'], df['Amount'], aggfunc=np.sum)
         .fillna(0)
       )

Output: Output：

Category    Candy  Food  Soda
Date                         
2022-04-01    0.0  24.0   5.0
2022-04-02    1.5  15.0   3.0
2022-04-03    2.0  29.0   0.0

Modification of your method with unstack :使用unstack修改您的方法：

out = (df.groupby(['Date', 'Category'])['Amount'].sum()
         .unstack(fill_value=0)
       )

Answer 2

You can use pd.pivot_table with sum as aggfunc您可以使用pd.pivot_table和sum作为aggfunc

df.pivot_table(index='Date', columns='Category', values='Amount', aggfunc='sum', fill_value=0)

Output: Output：

Category    Candy  Food  Soda
Date                         
2022-04-01    0.0    24     5
2022-04-02    1.5    15     3
2022-04-03    2.0    29     0

Because you mentioned next to crosstab (mozway's answer) and pivot_table also unstack , here a way you could do it with that:因为您在 crosstab （mozway 的答案）和 pivot_table 旁边提到了unstack ，所以您可以通过以下方式做到这一点：

df.set_index(['Date', 'Category'],append=True).unstack().groupby('Date').sum()

如何将多索引更改为 Panda 中的列

问题描述

2 个解决方案

解决方案1
2 2022-08-02 05:01:33

解决方案2
1 2022-08-02 05:05:58

如何将多索引更改为 Panda 中的列

问题描述

2 个解决方案

解决方案1 2 2022-08-02 05:01:33

解决方案2 1 2022-08-02 05:05:58

解决方案1
2 2022-08-02 05:01:33

解决方案2
1 2022-08-02 05:05:58