I have a DataFrame containing information about stores. It looks like the following: date | store_id | x 2019-01-01| 1 | 5 2019-01-01| 2 | 1 2019-01-05| 1 | 3...
date | store_id | x 2019-01-01| 1 | 5 2019-01-01| 2 | 1 2019-01-05| 1 | 3...
The multi-index is [ date
, store_id
]. Note that the dates are not unique. I want to resample the data at an hourly level, but only for the days in the date
column ie I don't want to fill in every hour in between. Furthermore, I want to fill in the value of x for every hour that is created. So the desired result for the above example would be
date | store_id | x 2019-01-01 00:00:00| 1 | 5 2019-01-01 01:00:00| 1 | 5 2019-01-01 02:00:00| 1 | 5... 2019-01-01 23:00:00| 1 | 5 2019-01-01 00:00:00| 2 | 1 2019-01-01 01:00:00| 2 | 1 2019-01-01 02:00:00| 2 | 1... 2019-01-01 23:00:00| 2 | 1 2019-01-05 00:00:00| 1 | 3 2019-01-05 01:00:00| 1 | 3 2019-01-05 02:00:00| 1 | 3... 2019-01-05 23:00:00| 1 | 3
Define the following "replication" function:
def repl(row):
return pd.DataFrame({'date': pd.date_range(start=row.date,
periods=24, freq='H'),'store_id': row.store_id, 'x': row.x})
It "replicates" the source row (parameter), returning a sequence of rows with the given date , for consecutive hours.
Then:
The code to do it is:
pd.concat(df.reset_index().apply(repl, axis=1).tolist(), ignore_index=True)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.