[英]build df from sum of column value
I need to group the data by customer_id and get the sum of purchase for each months.我需要按 customer_id 对数据进行分组并获取每个月的购买总和。 My data looks like this:
我的数据如下所示:
cust_id months
1 1
1 1
1 2
1 4
2 1
2 1
So I need to see the sum of purchase for each months and each customer.所以我需要查看每个月和每个客户的购买总额。 The desired output is:
所需的 output 是:
cust_id mo1 mo2 mo3 mo4
1 2 1 0 1
1 2 0 0 0
Use crosstab
with DataFrame.reindex
for add missing categories:使用带有
DataFrame.reindex
的crosstab
来添加缺失的类别:
r = range(df['months'].min(), df['months'].max() + 1)
df = (pd.crosstab(df['cust_id'],df['months'])
.reindex(r, axis=1, fill_value=0)
.add_prefix('mo'))
print (df)
months mo1 mo2 mo3 mo4
cust_id
1 2 1 0 1
2 2 0 0 0
If need all months is possible use ordered categoricals:如果需要所有月份都可以使用有序分类:
df['months'] = pd.Categorical(df['months'], ordered=True, categories=range(1, 13))
df = df.groupby(['cust_id','months']).size().unstack(fill_value=0).add_prefix('mo')
print (df)
months mo1 mo2 mo3 mo4 mo5 mo6 mo7 mo8 mo9 mo10 mo11 mo12
cust_id
1 2 1 0 1 0 0 0 0 0 0 0 0
2 2 0 0 0 0 0 0 0 0 0 0 0
Or reindex
by range
for all months:或者按
range
reindex
所有月份:
r = range(1, 13)
df = (pd.crosstab(df['cust_id'],df['months'])
.reindex(r, axis=1, fill_value=0)
.add_prefix('mo'))
print (df)
months mo1 mo2 mo3 mo4 mo5 mo6 mo7 mo8 mo9 mo10 mo11 mo12
cust_id
1 2 1 0 1 0 0 0 0 0 0 0 0
2 2 0 0 0 0 0 0 0 0 0 0 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.