繁体   English   中英

groupby 和 agg by 多列错误

[英]groupby and agg by multiple columns error

我正在尝试按monthtly_purchasesregion分组以获取客户数量和每月支出总和,但是,我收到以下错误:

主dataframe:

customer_id   monthly_spending      month             monthtly_purchases       region     
32324         342                   Feb-2019          5                        A
34345         293                   Feb-2019          5                        A
45453         212                   Feb-2019          3                        A
34343         453                   Feb-2019          3                        A
53533         112                   Feb-2019          5                        B
12334         511                   Feb-2019          5                        B
99934         123                   Feb-2019          3                        B
21213         534                   Feb-2019          3                        B
32324         143                   March-2019        5                        A
34345         453                   March-2019        5                        A
45453         234                   March-2019        3                        A
34343         432                   March-2019        3                        A
53533         124                   March-2019        5                        B
12334         453                   March-2019        5                        B
99934         224                   March-2019        3                        B
21213         634                   March-2019        3                        B

Output dataframe:

monthly_purchases region    monthly_spending    count_customers         month
5                 A         635               2                       Feb-2019
3                 A         665               2                       Feb-2019
5                 B         623               2                       Feb-2019
3                 B         657               2                       Feb-2019

5                 A         596               2                       Feb-2019
3                 A         666               2                       Feb-2019
5                 B         556               2                       Feb-2019
3                 B         858               2                       Feb-2019

这是我迄今为止尝试过的,但出现以下错误:

d = {'customer_id': ['count'], 'monthly_spending': ['sum']}

agg_df = df.groupby('monthtly_purchases', 'region').agg(d)
agg_df

Error msg: No numeric types to aggregate

当您使用 group by 2 或更多列时,请记住将列名放在列表中:

import pandas as pd

df = pd.DataFrame([
[32324, 342, "Feb-2019", 5, "A"],
[34345, 293, "Feb-2019", 5, "A"],
[45453, 212, "Feb-2019", 3, "A"],
[34343, 453, "Feb-2019", 3, "A"],
[53533, 112, "Feb-2019", 5, "B"],
[12334, 511, "Feb-2019", 5, "B"],
[99934, 123, "Feb-2019", 3, "B"],
[21213, 534, "Feb-2019", 3, "B"]
],
columns=["customer_id", "monthly_spending", "month", "monthtly_purchases", "region"]
)

d = {'customer_id': ['count'], 'monthly_spending': ['sum']}
agg_df = df.groupby(["monthtly_purchases", "region"]).agg(d)
print(agg_df)

回报:

                          customer_id monthly_spending
                                count              sum
monthtly_purchases region                             
3                  A                2              665
                   B                2              657
5                  A                2              635
                   B                2              623

根据评论中的要求,使多索引显式(通过创建新索引将其拆分为列):

agg_df.reset_index(inplace=True)
print(agg_df)

回报:

  monthtly_purchases region customer_id monthly_spending
                                  count              sum
0                  3      A           2              665
1                  3      B           2              657
2                  5      A           2              635
3                  5      B           2              623

包括评论中要求的月份:

agg_df = df.groupby(["month", "monthtly_purchases", "region"], as_index=False).agg(d)

回报:

        month monthtly_purchases region customer_id monthly_spending
                                              count              sum
0    Feb-2019                  3      A           2              665
1    Feb-2019                  3      B           2              657
2    Feb-2019                  5      A           2              635
3    Feb-2019                  5      B           2              623
4  March-2019                  3      A           2              666
5  March-2019                  3      B           2              858
6  March-2019                  5      A           2              596
7  March-2019                  5      B           2              577

列的顺序不同,但您可以使用以下代码获取它。

df = df.groupby(['monthtly_purchases','region','month']).agg({'customer_id': 'size', 'monthly_spending': 'sum'}).reset_index()
df
    monthtly_purchases  region  month   customer_id monthly_spending
0   3   A   Feb-2019    2   665
1   3   B   Feb-2019    2   657
2   5   A   Feb-2019    2   635
3   5   B   Feb-2019    2   623

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM