Pandas resampling with multi-index

Question

I have a DataFrame containing information about stores. It looks like the following: date | store_id | x 2019-01-01| 1 | 5 2019-01-01| 2 | 1 2019-01-05| 1 | 3... date | store_id | x 2019-01-01| 1 | 5 2019-01-01| 2 | 1 2019-01-05| 1 | 3...

The multi-index is [ date , store_id ]. Note that the dates are not unique. I want to resample the data at an hourly level, but only for the days in the date column ie I don't want to fill in every hour in between. Furthermore, I want to fill in the value of x for every hour that is created. So the desired result for the above example would be

date | store_id | x 2019-01-01 00:00:00| 1 | 5 2019-01-01 01:00:00| 1 | 5 2019-01-01 02:00:00| 1 | 5... 2019-01-01 23:00:00| 1 | 5 2019-01-01 00:00:00| 2 | 1 2019-01-01 01:00:00| 2 | 1 2019-01-01 02:00:00| 2 | 1... 2019-01-01 23:00:00| 2 | 1 2019-01-05 00:00:00| 1 | 3 2019-01-05 01:00:00| 1 | 3 2019-01-05 02:00:00| 1 | 3... 2019-01-05 23:00:00| 1 | 3

Answer 1

Define the following "replication" function:

def repl(row):
    return pd.DataFrame({'date': pd.date_range(start=row.date,
        periods=24, freq='H'),'store_id': row.store_id, 'x': row.x})

It "replicates" the source row (parameter), returning a sequence of rows with the given date , for consecutive hours.

Then:

reset the index, to have all columns as "normal" columns,
apply this function (to each row),
convert the resulting Series of DataFrames into a list (of DataFrames),
concatenate the result.

The code to do it is:

pd.concat(df.reset_index().apply(repl, axis=1).tolist(), ignore_index=True)

Pandas resampling with multi-index

Question

1 answers

solution1
1 ACCPTED 2019-11-01 18:40:28

Pandas resampling with multi-index

Question

1 answers

solution1 1 ACCPTED 2019-11-01 18:40:28

solution1
1 ACCPTED 2019-11-01 18:40:28