I have the first three columns in a dataframe in pandas. I want to calculate the 3 days moving average with respect to each product as shown in the 4th column.
Data
print (df)
Date Product Demand mov Avg
0 1-Jan-19 Product-01 3 NaN
1 2-Jan-19 Product-01 4 NaN
2 3-Jan-19 Product-01 5 4.0
3 4-Jan-19 Product-01 6 5.0
4 5-Jan-19 Product-01 7 6.0
5 3-Jan-19 Product-02 2 NaN
6 4-Jan-19 Product-02 3 NaN
7 5-Jan-19 Product-02 4 3.0
8 6-Jan-19 Product-02 5 4.0
9 7-Jan-19 Product-02 8 5.7
I tried using groupby and rolling mean but doesn't seem to work.
df['mov_avg'] =df.set_index('Date').groupby('Product').rolling('Demand',window=7).mean().reset_index(drop=True)
Use:
df['Date'] = pd.to_datetime(df['Date'], format='%d-%b-%y')
Your solution should be changed by rolling(3, freq='d')
:
#sorting if not sorted DataFrame by both columns
df = df.sort_values(['Date','Product']).reset_index(drop=True)
df['mov_avg'] = (df.set_index('Date')
.groupby('Product')['Demand']
.rolling(3, freq='d')
.mean()
.reset_index(drop=True))
Another better solution is use DataFrame.join
:
s = df.set_index('Date').groupby('Product')['Demand'].rolling(3, freq='d').mean()
df = df.join(s.rename('mov_avg'), on=['Product','Date'])
print (df)
Date Product Demand mov Avg mov_avg
0 2019-01-01 Product-01 3 NaN NaN
1 2019-01-02 Product-01 4 NaN NaN
2 2019-01-03 Product-01 5 4.0 4.000000
3 2019-01-04 Product-01 6 5.0 5.000000
4 2019-01-05 Product-01 7 6.0 6.000000
5 2019-01-03 Product-02 2 NaN NaN
6 2019-01-04 Product-02 3 NaN NaN
7 2019-01-05 Product-02 4 3.0 3.000000
8 2019-01-06 Product-02 5 4.0 4.000000
9 2019-01-07 Product-02 8 5.7 5.666667
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.