How to calculate cumulative weighted average using pandas

Question

I would like to ask you about how to find the cumulative average of a specific column value in a pandas dataframe. First, the data looks like this:

firm	date	reviewer	rate
A	2021-01-01	a	5
A	2021-01-01	b	1
A	2021-01-01	c	2
A	2021-01-02	d	3
A	2021-01-02	e	4
A	2021-01-03	f	3
A	2021-01-04	g	5
B	2021-01-01	h	5
B	2021-01-01	i	2
B	2021-01-02	j	3
B	2021-01-02	k	4
B	2021-01-03	a	3
B	2021-01-04	b	5

What I want to find is to get the average rating of a specific company by date, and add a column to find the cumulative average rating including today's average rating.
I want to make it into a dataframe like the one below.

firm	date	reviewer	rate	cum_avg_rate
A	2021-01-01	a	5	2.667
A	2021-01-01	b	1	2.667
A	2021-01-01	c	2	2.667
A	2021-01-02	d	3	3
A	2021-01-02	e	4	3
A	2021-01-03	f	3	3
A	2021-01-04	g	5	3.286
B	2021-01-01	h	5	3.5
B	2021-01-01	i	2	3.5
B	2021-01-02	j	3	3.5
B	2021-01-02	k	4	3.5
B	2021-01-03	a	3	3.4
B	2021-01-04	b	5	3.667

The method I have tried so far is to create a new data frame that calculates the average score and the number of reviews using the 'groupby' method with firm and date, and use this to create a cumulative average for each day.
The code is below.

firm_gp=avg_mean_rate.groupby(['firm','date'])['mean']
firm_gp_count=avg_mean_rate.groupby(['firm','date'])['count']
avg_mean_rate['new_avg_grade']=( (firm_gp * firm_gp_count).cumsum())/firm_gp_count.cumsum()

However, the problem is that the following error occurs in the process of calculating the cumulative average for each day.
TypeError: unsupported operand type(s) for *: 'SeriesGroupBy' and 'method'

As the second method, I tried the following method using numpy.

def w_cum_avg(avg_mean_rate,mean,count):
    d=avg_mean_rate['mean']
    w= avg_mean_rate['count']
    return(d*w).cumsum() / w.cumsum()
avg_mean_rate.groupby(['firm','date']).apply(w_cum_avg,'mean','count')

But this doesn't work well, what I expected.

I would appreciate it if you could teach me how to get results.

Thank you in advance.

Answer 1

We could compute the daily sum and count per firm with groupby aggregate then groupby cumsum to get the daily cumulative total per firm . Compute the mean by dividing and join back to the DataFrame:

g = (
    df.groupby(['firm', 'date'])['rate']
        .agg(['sum', 'count'])
        .groupby(level='firm').cumsum()
)

df = df.join(
    g['sum'].div(g['count']).rename('cum_avg_rate'),
    on=['firm', 'date']  # align index on columns
)

df :

   firm        date reviewer  rate  cum_avg_rate
0     A  2021-01-01        a     5      2.666667
1     A  2021-01-01        b     1      2.666667
2     A  2021-01-01        c     2      2.666667
3     A  2021-01-02        d     3      3.000000
4     A  2021-01-02        e     4      3.000000
5     A  2021-01-03        f     3      3.000000
6     A  2021-01-04        g     5      3.285714
7     B  2021-01-01        h     5      3.500000
8     B  2021-01-01        i     2      3.500000
9     B  2021-01-02        j     3      3.500000
10    B  2021-01-02        k     4      3.500000
11    B  2021-01-03        a     3      3.400000
12    B  2021-01-04        b     5      3.666667

Setup:

import pandas as pd

df = pd.DataFrame({
    'firm': ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B', 'B'],
    'date': ['2021-01-01', '2021-01-01', '2021-01-01', '2021-01-02',
             '2021-01-02', '2021-01-03', '2021-01-04', '2021-01-01',
             '2021-01-01', '2021-01-02', '2021-01-02', '2021-01-03',
             '2021-01-04'],
    'reviewer': ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'a',
                 'b'],
    'rate': [5, 1, 2, 3, 4, 3, 5, 5, 2, 3, 4, 3, 5]
})

How to calculate cumulative weighted average using pandas

Question

1 answers

solution1
2 ACCPTED 2021-10-17 19:03:44

How to calculate cumulative weighted average using pandas

Question

1 answers

solution1 2 ACCPTED 2021-10-17 19:03:44

solution1
2 ACCPTED 2021-10-17 19:03:44