简体   繁体   中英

Resample time series data

I have some random hourly time series data, (lets make some up) how do I resample for a daily max value as well as create a separate df column for the hour of the recorded daily max value?

import pandas as pd 
import numpy as np 
from numpy.random import randint
import os

np.random.seed(10)  # added for reproductibility                                                                                                                                                                 

rng = pd.date_range('10/9/2018 00:00', periods=1000, freq='1H') 
df = pd.DataFrame({'Random_Number':randint(1, 100, 1000)}, index=rng)

df.index.name = 'Date'

Resample random value:

daily_summary = pd.DataFrame()

daily_summary['Random_Number_Resamp'] = df['Random_Number'].resample('D').max()


daily_summary.head()

And then an attempt for recording the hour that the daily max value happened...

daily_summary['Hour_Map'] = daily_summary.Random_Number_Resamp.index.strftime('%H').astype('int')

daily_summary

The code above doesnt throw an attribute error but the hour_map will be zero.. How do I accomplish when the daily_summary df is created that the hour_map also occurs in this step?

You could do groupby :

df.groupby(df.index.normalize())['Random_Number'].agg(['idxmax', 'max']) 

Output (head):

                         idxmax     max
Date        
2018-10-09  2018-10-09 05:00:00     94
2018-10-10  2018-10-10 20:00:00     95
2018-10-11  2018-10-11 15:00:00     97
2018-10-12  2018-10-12 18:00:00     98
2018-10-13  2018-10-13 22:00:00     91

I think I understand what you are looking for...

Just create an hour column in the original df then resample:

np.random.seed(10)  # added for reproductibility                                                                                                                                                                 

rng = pd.date_range('10/9/2018 00:00', periods=1000, freq='1H') 
df = pd.DataFrame({'Random_Number':randint(1, 100, 1000)}, index=rng)

df.index.name = 'Date'

# create hour column
df['hour'] = df.index.hour

# resample df
daily_summary = df.resample('D').max()

            Random_Number  hour
Date                           
2018-10-09             94    23
2018-10-10             95    23
2018-10-11             97    23
2018-10-12             98    23
2018-10-13             91    23

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM