简体   繁体   中英

Pandas groupby and rolling 4 week window

I have a series of data with half hour intervals. I need to take a rolling 4 (whole) week average of Tuesday, Wednesday, and Thursday, for each half hour interval in the dataset. So the first 'window' would have averages for times 00:00:00, 00:30:00,...,23:00:00, 23:30:00 for weeks 1-4. Then the next window would have averages for weeks 2-5 etc.

I have the following dataset, which has daily data, but only including Tuesday, Wednesday, and Thursday (for whatever reason other days are not used in calculating the averages). Furthermore, within those days, I have data in half hour intervals (but only including half hour time intervals of 00:00:00, 00:30:00, 01:00:00, and 01:30:00 in the sample).

datetime    timeblock   speed
1/3/2017 0:00   0:00:00 81.186885
1/3/2017 0:30   0:30:00 NaN
1/3/2017 1:00   1:00:00 85.277724
1/3/2017 1:30   1:30:00 85.077176
1/4/2017 0:00   0:00:00 80.691608
1/4/2017 0:30   0:30:00 79.223225
1/4/2017 1:00   1:00:00 82.330169
1/4/2017 1:30   1:30:00 79.495578
1/5/2017 0:00   0:00:00 74.162426
1/5/2017 0:30   0:30:00 75.206492
1/5/2017 1:00   1:00:00 77.6484
1/5/2017 1:30   1:30:00 72.61875
1/10/2017 0:00  0:00:00 77.785555
1/10/2017 0:30  0:30:00 80.617395
1/10/2017 1:00  1:00:00 80.094947
1/10/2017 1:30  1:30:00 77.697473
1/11/2017 0:00  0:00:00 74.7104
1/11/2017 0:30  0:30:00 75.691326
1/11/2017 1:00  1:00:00 74.639803
1/11/2017 1:30  1:30:00 81.797268
1/12/2017 0:00  0:00:00 79.571042
1/12/2017 0:30  0:30:00 78.083612
1/12/2017 1:00  1:00:00 78.747287
1/12/2017 1:30  1:30:00 78.128129
1/17/2017 0:00  0:00:00 76.509323
1/17/2017 0:30  0:30:00 77.256
1/17/2017 1:00  1:00:00 78.627085
1/17/2017 1:30  1:30:00 81.588
1/18/2017 0:00  0:00:00 77.82543
1/18/2017 0:30  0:30:00 80.231272
1/18/2017 1:00  1:00:00 NaN
1/18/2017 1:30  1:30:00 74.656384
1/19/2017 0:00  0:00:00 77.37165
1/19/2017 0:30  0:30:00 80.328705
1/19/2017 1:00  1:00:00 80.011531
1/19/2017 1:30  1:30:00 79.643781
1/24/2017 0:00  0:00:00 81.167016
1/24/2017 0:30  0:30:00 NaN
1/24/2017 1:00  1:00:00 83.128695
1/24/2017 1:30  1:30:00 77.799428
1/25/2017 0:00  0:00:00 73.106437
1/25/2017 0:30  0:30:00 71.316
1/25/2017 1:00  1:00:00 75.966
1/25/2017 1:30  1:30:00 74.345225
1/26/2017 0:00  0:00:00 78.768
1/26/2017 0:30  0:30:00 80.436508
1/26/2017 1:00  1:00:00 76.782222
1/26/2017 1:30  1:30:00 76.168687
1/31/2017 0:00  0:00:00 73.780363
1/31/2017 0:30  0:30:00 72.32356
1/31/2017 1:00  1:00:00 74.119404
1/31/2017 1:30  1:30:00 72.412363
2/1/2017 0:00   0:00:00 75.572408
2/1/2017 0:30   0:30:00 72.486593
2/1/2017 1:00   1:00:00 77.357
2/1/2017 1:30   1:30:00 74.134188
2/2/2017 0:00   0:00:00 72.209382
2/2/2017 0:30   0:30:00 75.792807
2/2/2017 1:00   1:00:00 74.167605
2/2/2017 1:30   1:30:00 78.053373

I've tried the following code, but it does not give the desired results:

roll_mean = sample.groupby('timeblock')['speed'].rolling('30D', min_value = '30D').mean()

The desired results should be the following:

Window      00:00:00    00:30:00    01:00:00    01:30:00
1 (wks 1-4) 77.74       NaN         NaN         78.25
2 (wks 2-5) 76.53       NaN         NaN         77.20

Thank you in advance

Edit: Grammar/clarification

In[1]: sample.index
Out[1]: 
DatetimeIndex(['2017-01-03 00:00:00', '2017-01-03 00:30:00',
               '2017-01-03 01:00:00', '2017-01-03 01:30:00',
               '2017-01-03 02:00:00', '2017-01-03 02:30:00',
               '2017-01-03 03:00:00', '2017-01-03 03:30:00',
               '2017-01-03 04:00:00', '2017-01-03 04:30:00',
               ...
               '2017-12-28 19:00:00', '2017-12-28 19:30:00',
               '2017-12-28 20:00:00', '2017-12-28 20:30:00',
               '2017-12-28 21:00:00', '2017-12-28 21:30:00',
               '2017-12-28 22:00:00', '2017-12-28 22:30:00',
               '2017-12-28 23:00:00', '2017-12-28 23:30:00'],
              dtype='datetime64[ns]', name='datetime', length=7488, freq=None)
In[2]: sample.dtypes
Out[3]: 
timeblock     object
speed        float64
dtype: object

So I was able to get the results I needed.

toll = pd.pivot_table(toll, columns='timeblock',index='date', values='speed')
toll = toll.resample('W').mean().rolling(4).mean()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM