Pandas: Use .shift() in GroupBy but impute with values from (discarded) previous group

Question

I am working on a timeseries problem, where I would like to calculate a column containing the average value in the same hour, all days in the previous week (and treat weekday and weekend seperately, also).

To do so, I run the following command:

df['prev_week_value'] = df.groupby([df.index.hour, df.index.dayofweek > 4, df.index.isocalendar().week, df.index.year])['value'].transform('mean')

To ensure no data leakage, I run the following to shift it:

df['prev_week_value_d1'] = df.groupby([df.index.hour, df.index.dayofweek > 4, df.index.isocalendar().week, df.index.year])['prev_week_value'].shift()

The problem is I am left with many missing values. Since pandas treat each group individually, I cant fill the missing values in the most recent with the values that was "discarded" in the group before. And that is what I wish to do.

In conclusion. For the first week in 2023 I am now left with missing values, where I instead would like the values from the previous week in 2022.

I have tried:

df['prev_week_value_d1'] = df.groupby([df.index.hour, df.index.dayofweek > 4])['prev_week_value_d1'].ffill()

Which fills the missing rows with possibly very old values.

Answer 1

The exact expectations are unclear without an example, but you could try groupby.apply :

df['prev_week_value_d1'] = (df.groupby([df.index.hour, df.index.dayofweek > 4, df.index.isocalendar().week, df.index.year])
                            ['prev_week_value'].apply(lambda s: s.shift().bfill()
                           )

Or:

df['prev_week_value_d1'] = (df.groupby([df.index.hour, df.index.dayofweek > 4, df.index.isocalendar().week, df.index.year])
                            ['prev_week_value'].apply(lambda s: s.shift(fill_value=s.iloc[0])
                           )

Pandas: Use .shift() in GroupBy but impute with values from (discarded) previous group

Question

1 answers

solution1
0 2023-01-24 03:24:25

Pandas: Use .shift() in GroupBy but impute with values from (discarded) previous group

Question

1 answers

solution1 0 2023-01-24 03:24:25

solution1
0 2023-01-24 03:24:25