简体   繁体   中英

python - how to sum the average of the amount per month per year

I have data records look like this

category    dt          userid  amt
1           4/14/2019       1   140
1           5/1/2019        1   500
2           5/5/2019        1   300 
3           5/19/2019       1   230
2           6/17/2019       1   200
4           6/18/2019       1   400
1           7/30/2019       1   400
1           8/17/2019       1   300
2           12/2/2019       1   200
2           12/23/2019      1   500
1           1/10/2019       2   470
1           2/25/2019       2   450
2           10/4/2019       2   350

Q1: How can I sum the average of the amount per month per year?

user    month1  month2  month3  month4  month5  month6  month7  month8  month9  month10 month11 month12 avg_all_month
1       0       0       0       140     343.33  300     400     300     0       0       0       350      305.55
2       470     450     0       0       0       0       0       0       0       350     0       0        423.33

Q2: How to count transaction per category

user   pro_cat1   pro_cat2  pro_cat3  pro_cat4    total_product
1      4          3         1         1           7
2      2          1         0         0           3

If there is same year you can use DataFrame.pivot_table with DataFrame.reindex and DataFrame.add_prefix with mean per all months:

df['dt'] = pd.to_datetime(df['dt'])

df2 = (df.pivot_table(index='userid', 
                    columns=df['dt'].dt.month, 
                    values='amt', 
                    aggfunc='mean',
                    fill_value=0)
       .reindex(range(1, 13), axis=1, fill_value=0)
       .add_prefix('month')
       .assign(avg_all_month = lambda x: df.groupby('userid')['amt'].mean())
       .reset_index()
       .rename_axis(None, axis=1))
print (df2)
   userid  month1  month2  month3  month4      month5  month6  month7  month8  \
0       1       0       0       0     140  343.333333     300     400     300   
1       2     470     450       0       0    0.000000       0       0       0   

   month9  month10  month11  month12  avg_all_month  
0       0        0        0      350     317.000000  
1       0      350        0        0     423.333333  

And then for second is used crosstab with sum :

df3 = (pd.crosstab(df['userid'], 
                    df['category'])
        .add_prefix('pro_')
        .assign(total_product = lambda x: x.sum(axis=1))
        .reset_index()
        .rename_axis(None, axis=1)
       )
print (df3)
   userid  pro_1  pro_2  pro_3  pro_4  total_product
0       1      4      4      1      1             10
1       2      2      1      0      0              3

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM