[英]groupby and agg by multiple columns error
我正在嘗試按monthtly_purchases
和region
分組以獲取客戶數量和每月支出總和,但是,我收到以下錯誤:
主dataframe:
customer_id monthly_spending month monthtly_purchases region
32324 342 Feb-2019 5 A
34345 293 Feb-2019 5 A
45453 212 Feb-2019 3 A
34343 453 Feb-2019 3 A
53533 112 Feb-2019 5 B
12334 511 Feb-2019 5 B
99934 123 Feb-2019 3 B
21213 534 Feb-2019 3 B
32324 143 March-2019 5 A
34345 453 March-2019 5 A
45453 234 March-2019 3 A
34343 432 March-2019 3 A
53533 124 March-2019 5 B
12334 453 March-2019 5 B
99934 224 March-2019 3 B
21213 634 March-2019 3 B
Output dataframe:
monthly_purchases region monthly_spending count_customers month
5 A 635 2 Feb-2019
3 A 665 2 Feb-2019
5 B 623 2 Feb-2019
3 B 657 2 Feb-2019
5 A 596 2 Feb-2019
3 A 666 2 Feb-2019
5 B 556 2 Feb-2019
3 B 858 2 Feb-2019
這是我迄今為止嘗試過的,但出現以下錯誤:
d = {'customer_id': ['count'], 'monthly_spending': ['sum']}
agg_df = df.groupby('monthtly_purchases', 'region').agg(d)
agg_df
Error msg: No numeric types to aggregate
當您使用 group by 2 或更多列時,請記住將列名放在列表中:
import pandas as pd
df = pd.DataFrame([
[32324, 342, "Feb-2019", 5, "A"],
[34345, 293, "Feb-2019", 5, "A"],
[45453, 212, "Feb-2019", 3, "A"],
[34343, 453, "Feb-2019", 3, "A"],
[53533, 112, "Feb-2019", 5, "B"],
[12334, 511, "Feb-2019", 5, "B"],
[99934, 123, "Feb-2019", 3, "B"],
[21213, 534, "Feb-2019", 3, "B"]
],
columns=["customer_id", "monthly_spending", "month", "monthtly_purchases", "region"]
)
d = {'customer_id': ['count'], 'monthly_spending': ['sum']}
agg_df = df.groupby(["monthtly_purchases", "region"]).agg(d)
print(agg_df)
回報:
customer_id monthly_spending
count sum
monthtly_purchases region
3 A 2 665
B 2 657
5 A 2 635
B 2 623
根據評論中的要求,使多索引顯式(通過創建新索引將其拆分為列):
agg_df.reset_index(inplace=True)
print(agg_df)
回報:
monthtly_purchases region customer_id monthly_spending
count sum
0 3 A 2 665
1 3 B 2 657
2 5 A 2 635
3 5 B 2 623
包括評論中要求的月份:
agg_df = df.groupby(["month", "monthtly_purchases", "region"], as_index=False).agg(d)
回報:
month monthtly_purchases region customer_id monthly_spending
count sum
0 Feb-2019 3 A 2 665
1 Feb-2019 3 B 2 657
2 Feb-2019 5 A 2 635
3 Feb-2019 5 B 2 623
4 March-2019 3 A 2 666
5 March-2019 3 B 2 858
6 March-2019 5 A 2 596
7 March-2019 5 B 2 577
列的順序不同,但您可以使用以下代碼獲取它。
df = df.groupby(['monthtly_purchases','region','month']).agg({'customer_id': 'size', 'monthly_spending': 'sum'}).reset_index()
df
monthtly_purchases region month customer_id monthly_spending
0 3 A Feb-2019 2 665
1 3 B Feb-2019 2 657
2 5 A Feb-2019 2 635
3 5 B Feb-2019 2 623
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.