简体   繁体   中英

Pandas dataframe groupby / rolling - why no reset of rolling mean on new group?

I'm trying to summarise worked hours for a group of people and need to calculate a rolling average.

I can do this with df.groupby and df.rolling but for a rolling average of 'n' values, I expect the first n-1 values in a group to be nan or 0.

Example -

 import pandas as pd import numpy as np employees = ['Alice', 'Alice', 'Bob', 'Bob', 'Bob' ] weeks = [2, 3, 2, 3, 4] hours = [5, 8, 4, 2, 5] df = pd.DataFrame.from_dict({'employee' : employees, 'week': weeks, 'hours': hours}) df.groupby(['employee', 'week']).sum().rolling(2).mean() df employee hours week 0 Alice 5 2 1 Alice 8 3 2 Bob 4 2 3 Bob 2 3 4 Bob 5 4 

Result -

  hours employee week Alice 2 NaN 3 6.5 Bob 2 6.0 <-- expect this to be 0 3 3.0 4 3.5 

Expected result

  hours employee week Alice 2 NaN 3 6.5 Bob 2 NaN <--- mean reset to 0 on new group 3 3.0 4 3.5 

This reset (1st row of Bob) doesn't happen. How can I make it happen?

Many thanks (and apols for formatting)

Are you looking for

s=df.groupby(['employee']).apply(lambda x : x['hours'].rolling(2).mean())
s
Out[225]: 
employee   
Alice     0       nan
          1   6.50000
Bob       2       nan
          3   3.00000
          4   3.50000
Name: hours, dtype: float64

# assign it back 
df['roll_mean']=s.reset_index(level=0,drop=True) 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM