I am working on a timeseries problem, where I would like to calculate a column containing the average value in the same hour, all days in the previous week (and treat weekday and weekend seperately, also).
To do so, I run the following command:
df['prev_week_value'] = df.groupby([df.index.hour, df.index.dayofweek > 4, df.index.isocalendar().week, df.index.year])['value'].transform('mean')
To ensure no data leakage, I run the following to shift it:
df['prev_week_value_d1'] = df.groupby([df.index.hour, df.index.dayofweek > 4, df.index.isocalendar().week, df.index.year])['prev_week_value'].shift()
The problem is I am left with many missing values. Since pandas treat each group individually, I cant fill the missing values in the most recent with the values that was "discarded" in the group before. And that is what I wish to do.
In conclusion. For the first week in 2023 I am now left with missing values, where I instead would like the values from the previous week in 2022.
I have tried:
df['prev_week_value_d1'] = df.groupby([df.index.hour, df.index.dayofweek > 4])['prev_week_value_d1'].ffill()
Which fills the missing rows with possibly very old values.
The exact expectations are unclear without an example, but you could try groupby.apply
:
df['prev_week_value_d1'] = (df.groupby([df.index.hour, df.index.dayofweek > 4, df.index.isocalendar().week, df.index.year])
['prev_week_value'].apply(lambda s: s.shift().bfill()
)
Or:
df['prev_week_value_d1'] = (df.groupby([df.index.hour, df.index.dayofweek > 4, df.index.isocalendar().week, df.index.year])
['prev_week_value'].apply(lambda s: s.shift(fill_value=s.iloc[0])
)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.