I'm trying to summarise worked hours for a group of people and need to calculate a rolling average.
I can do this with df.groupby and df.rolling but for a rolling average of 'n' values, I expect the first n-1 values in a group to be nan or 0.
Example -
import pandas as pd import numpy as np employees = ['Alice', 'Alice', 'Bob', 'Bob', 'Bob' ] weeks = [2, 3, 2, 3, 4] hours = [5, 8, 4, 2, 5] df = pd.DataFrame.from_dict({'employee' : employees, 'week': weeks, 'hours': hours}) df.groupby(['employee', 'week']).sum().rolling(2).mean() df employee hours week 0 Alice 5 2 1 Alice 8 3 2 Bob 4 2 3 Bob 2 3 4 Bob 5 4
Result -
hours employee week Alice 2 NaN 3 6.5 Bob 2 6.0 <-- expect this to be 0 3 3.0 4 3.5
Expected result
hours employee week Alice 2 NaN 3 6.5 Bob 2 NaN <--- mean reset to 0 on new group 3 3.0 4 3.5
This reset (1st row of Bob) doesn't happen. How can I make it happen?
Many thanks (and apols for formatting)
Are you looking for
s=df.groupby(['employee']).apply(lambda x : x['hours'].rolling(2).mean())
s
Out[225]:
employee
Alice 0 nan
1 6.50000
Bob 2 nan
3 3.00000
4 3.50000
Name: hours, dtype: float64
# assign it back
df['roll_mean']=s.reset_index(level=0,drop=True)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.