简体   繁体   中英

Create multiple subsets of a time series based on Period

I have a DataFrame containing a time series such as follow:

在此处输入图片说明

I would like to create multiple subsets of that DataFrame, that would each contain on week worth of data, spanning from Sunday 0am to Saturday 0am.

I can think of a way to do that with RRule from timeutil, but it seems there might be a more intuitive/direct method using Pandas Periods.

However I am quite new to it so not sure where to start looking. Ideally it would be something like:

Period= Sun 0am to Sat 0am
Subsets=[]
for Period in DataFrame:
    Subsets.append(DataFrame[Period])

Something like that.....

data:

Pd.DataFrame(dict, columns=['timestamp','open','high','low','close','volume'])

dict={'volume': {Timestamp('2005-03-06 19:00:00'): 521.0, Timestamp('2005-03-06 20:00:00'): 234.0, Timestamp('2005-03-06 20:30:00'): 164.0, Timestamp('2005-03-06 21:00:00'): 99.0, Timestamp('2005-03-06 17:30:00'): 1603.0, Timestamp('2005-03-06 21:30:00'): 389.0, Timestamp('2005-03-06 18:00:00'): 590.0, Timestamp('2005-03-06 17:00:00'): 1668.0, Timestamp('2005-03-06 19:30:00'): 79.0, Timestamp('2005-03-06 18:30:00'): 213.0}, 'low': {Timestamp('2005-03-06 19:00:00'): 1226.25, Timestamp('2005-03-06 20:00:00'): 1226.0, Timestamp('2005-03-06 20:30:00'): 1226.0, Timestamp('2005-03-06 21:00:00'): 1226.0, Timestamp('2005-03-06 17:30:00'): 1225.75, Timestamp('2005-03-06 21:30:00'): 1225.5, Timestamp('2005-03-06 18:00:00'): 1226.75, Timestamp('2005-03-06 17:00:00'): 1225.0, Timestamp('2005-03-06 19:30:00'): 1226.25, Timestamp('2005-03-06 18:30:00'): 1226.75}, 'timestamp': {Timestamp('2005-03-06 19:00:00'): 732011.79166666663, Timestamp('2005-03-06 20:00:00'): 732011.83333333337, Timestamp('2005-03-06 20:30:00'): 732011.85416666663, Timestamp('2005-03-06 21:00:00'): 732011.875, Timestamp('2005-03-06 17:30:00'): 732011.72916666663, Timestamp('2005-03-06 21:30:00'): 732011.89583333337, Timestamp('2005-03-06 18:00:00'): 732011.75, Timestamp('2005-03-06 17:00:00'): 732011.70833333337, Timestamp('2005-03-06 19:30:00'): 732011.8125, Timestamp('2005-03-06 18:30:00'): 732011.77083333337}, 'open': {Timestamp('2005-03-06 19:00:00'): 1227.0, Timestamp('2005-03-06 20:00:00'): 1226.25, Timestamp('2005-03-06 20:30:00'): 1226.5, Timestamp('2005-03-06 21:00:00'): 1226.0, Timestamp('2005-03-06 17:30:00'): 1225.75, Timestamp('2005-03-06 21:30:00'): 1225.75, Timestamp('2005-03-06 18:00:00'): 1227.0, Timestamp('2005-03-06 17:00:00'): 1225.75, Timestamp('2005-03-06 19:30:00'): 1226.25, Timestamp('2005-03-06 18:30:00'): 1227.25}, 'high': {Timestamp('2005-03-06 19:00:00'): 1227.0, Timestamp('2005-03-06 20:00:00'): 1226.5, Timestamp('2005-03-06 20:30:00'): 1226.5, Timestamp('2005-03-06 21:00:00'): 1226.25, Timestamp('2005-03-06 17:30:00'): 1227.5, Timestamp('2005-03-06 21:30:00'): 1226.0, Timestamp('2005-03-06 18:00:00'): 1227.5, Timestamp('2005-03-06 17:00:00'): 1226.25, Timestamp('2005-03-06 19:30:00'): 1226.75, Timestamp('2005-03-06 18:30:00'): 1227.5}, 'close': {Timestamp('2005-03-06 19:00:00'): 1226.25, Timestamp('2005-03-06 20:00:00'): 1226.25, Timestamp('2005-03-06 20:30:00'): 1226.0, Timestamp('2005-03-06 21:00:00'): 1226.0, Timestamp('2005-03-06 17:30:00'): 1227.0, Timestamp('2005-03-06 21:30:00'): 1225.5, Timestamp('2005-03-06 18:00:00'): 1227.25, Timestamp('2005-03-06 17:00:00'): 1225.5, Timestamp('2005-03-06 19:30:00'): 1226.5, Timestamp('2005-03-06 18:30:00'): 1226.75}}

You can use:

#sample dataframe
start = pd.to_datetime('2016-12-28')
rng = pd.date_range(start, periods=100, freq='100min')
df = pd.DataFrame({'timestamp': rng, 'X': range(100), 
                   'id': ['a'] * 30 + ['b'] * 30 + ['c'] * 40 })  
df = df.set_index(['timestamp'])
#print (df)

First filter out weekends by dayofweek with boolean indexing if necessary:

#df = df[df.index.dayofweek < 5]
#print (df)

Then use period_range with week frequency:

#first date in index
first_date = df.index[0]
#last date in index
last_date = df.index[-1]
per = pd.period_range(first_date,last_date, freq='W')
print (per)
PeriodIndex(['2016-12-26/2017-01-01', 
             '2017-01-02/2017-01-08'], dtype='period[W-SUN]', freq='W-SUN')

Last create Subsets by list comprehension with converting each period to_timestamp and select values by loc :

Subsets = [ df.loc[x.to_timestamp('D', how='s'): x.to_timestamp('D', how='e')] for x in per]
#print (Subsets)

If loc cannot be used, because end-points are not included in Dataetimeindex use boolean indexing :

Subsets = [ df[(df.index > x.to_timestamp('D', how='s')) & 
               (df.index < x.to_timestamp('D', how='e'))] for x in per]
#print (Subsets)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM