简体   繁体   中英

Pandas resampling with multi-index

I have a DataFrame containing information about stores. It looks like the following: date | store_id | x 2019-01-01| 1 | 5 2019-01-01| 2 | 1 2019-01-05| 1 | 3... date | store_id | x 2019-01-01| 1 | 5 2019-01-01| 2 | 1 2019-01-05| 1 | 3...

The multi-index is [ date , store_id ]. Note that the dates are not unique. I want to resample the data at an hourly level, but only for the days in the date column ie I don't want to fill in every hour in between. Furthermore, I want to fill in the value of x for every hour that is created. So the desired result for the above example would be

date | store_id | x 2019-01-01 00:00:00| 1 | 5 2019-01-01 01:00:00| 1 | 5 2019-01-01 02:00:00| 1 | 5... 2019-01-01 23:00:00| 1 | 5 2019-01-01 00:00:00| 2 | 1 2019-01-01 01:00:00| 2 | 1 2019-01-01 02:00:00| 2 | 1... 2019-01-01 23:00:00| 2 | 1 2019-01-05 00:00:00| 1 | 3 2019-01-05 01:00:00| 1 | 3 2019-01-05 02:00:00| 1 | 3... 2019-01-05 23:00:00| 1 | 3

Define the following "replication" function:

def repl(row):
    return pd.DataFrame({'date': pd.date_range(start=row.date,
        periods=24, freq='H'),'store_id': row.store_id, 'x': row.x})

It "replicates" the source row (parameter), returning a sequence of rows with the given date , for consecutive hours.

Then:

  • reset the index, to have all columns as "normal" columns,
  • apply this function (to each row),
  • convert the resulting Series of DataFrames into a list (of DataFrames),
  • concatenate the result.

The code to do it is:

pd.concat(df.reset_index().apply(repl, axis=1).tolist(), ignore_index=True)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM