I have a Pandas Dataframe with a DateTime index. It has closing prices of some stocks sampled at the 1-minute interval. I want to resample this dataframe and get it at the 5-minute interval, as if it had been collected in that way. For example:
SPY AAPL
DateTime
2014-01-02 09:30:00 183.91 555.890
2014-01-02 09:31:00 183.89 556.060
2014-01-02 09:32:00 183.90 556.180
2014-01-02 09:33:00 184.00 556.550
2014-01-02 09:34:00 183.98 556.325
2014-01-02 09:35:00 183.89 554.620
2014-01-02 09:36:00 183.83 554.210
I need to get something like
SPY AAPL
DateTime
2014-01-02 09:30:00 183.91 555.890
2014-01-02 09:35:00 183.89 554.620
The natural method would be resample()
or asfreq()
with Pandas. They indeed produce what I need, however with some undesired output as well. My sample has no observations from 4pm of a given weekday until 9:30am of the following day because trading halts during these hours. These mentioned methods end up completing the dataframe with NaN during these periods when there is actually no data to resample from. Is there any option I can use to avoid this behavior? From 4:05pm until 9:25am of the following day I get lots of NaN and just that!
My quick and dirty solution was the following:
Prices_5min = Prices[np.remainder(Prices.index.minute, 5) == 0]
Although I believe that this a quick and elegant solution, I'd would assume that resample()
has some option to perform this task. Any ideas? Thanks a lot!
EDIT: Following the comment regarding the undesired output, I add the following code to showcase the problem:
New_Prices = Prices.asfreq('5min')
New_Prices.loc['2014-01-02 15:50:00':'2014-01-03 9:05:00']
Out:
SPY AAPL
DateTime
2014-01-02 15:50:00 183.12 552.83
2014-01-02 15:55:00 183.08 552.89
2014-01-02 16:00:00 182.92 553.18
2014-01-02 16:05:00 NaN NaN
2014-01-02 16:10:00 NaN NaN
... ... ...
2014-01-03 08:45:00 NaN NaN
2014-01-03 08:50:00 NaN NaN
2014-01-03 08:55:00 NaN NaN
2014-01-03 09:00:00 NaN NaN
2014-01-03 09:05:00 NaN NaN
All these NaN should be part of the final result. They are only there because there were no trading hours. I want to avoid that.
You could simply discard the rows containing NaN values with dropna()
.
Demo with a slightly modified version of your input data:
SPY AAPL
DateTime
2014-01-02 09:30:00 183.91 555.890
2014-01-02 09:31:00 183.89 556.060
2014-01-02 09:32:00 183.90 556.180
2014-01-02 09:33:00 184.00 556.550
2014-01-02 09:34:00 183.98 556.325
2014-01-02 09:45:00 183.89 554.620
2014-01-02 09:46:00 183.83 554.210
Straight resampling gives rows with NaN values:
df.asfreq('5min')
SPY AAPL
DateTime
2014-01-02 09:30:00 183.91 555.89
2014-01-02 09:35:00 NaN NaN
2014-01-02 09:40:00 NaN NaN
2014-01-02 09:45:00 183.89 554.62
which go avay with dropna()
:
df.asfreq('5min').dropna()
SPY AAPL
DateTime
2014-01-02 09:30:00 183.91 555.89
2014-01-02 09:45:00 183.89 554.62
Overview: Create an interval index to describe trading times (0930 to 1400 on business days). Then find the time stamps (from resample) that are in the trading window.
import pandas as pd
bdate_range = pd.bdate_range(start='2014-01-02', periods=5)
bdate_range
trading_windows = [
(d + pd.Timedelta('9.5h'), d + pd.Timedelta('16h'))
for d in bdate_range
]
trading_windows
trading_windows = pd.IntervalIndex.from_tuples(trading_windows)
for t in trading_windows: print(t)
(2014-01-02 09:30:00, 2014-01-02 16:00:00]
(2014-01-03 09:30:00, 2014-01-03 16:00:00]
(2014-01-06 09:30:00, 2014-01-06 16:00:00]
(2014-01-07 09:30:00, 2014-01-07 16:00:00]
(2014-01-08 09:30:00, 2014-01-08 16:00:00]
...and created a list of the 5-minute intervals from your example (some during trading hours, other time stamps when trading is halted)
stamps = [
'2014-01-02 15:50:00',
'2014-01-02 15:55:00',
'2014-01-02 16:00:00',
'2014-01-02 16:05:00',
'2014-01-02 16:10:00',
]
stamps = pd.to_datetime(stamps)
Then, I used the .contains()
method of Interval Index to determine whether a timestamp (from resample) is during the trading window:
mask = [trading_windows.contains(stamp).any() for stamp in stamps]
stamps[mask]
[3]:
DatetimeIndex(['2014-01-02 15:50:00', '2014-01-02 15:55:00',
'2014-01-02 16:00:00'],
dtype='datetime64[ns]', freq=None)
This keeps all time stamps during the trading window (whether there are actual trades or not). And you can include holidays in the creation of 'trading_windows'.
Possibly resampling at 5 min freq along with 'last' statistic must work in ur case U can specify the labels as the right and include the right end in the resampling
Finally, u can apply ffill in the to avoid time leakage
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.