Group DataFrame in 5-minute intervals

Question

How do I get just the 5 minute data using Python/pandas out of this csv? For every 5 minute interval I'm trying to get the DATE, TIME,OPEN, HIGH, LOW, CLOSE, VOLUME for that 5 minute interval.

DATE       TIME     OPEN    HIGH    LOW     CLOSE   VOLUME
02/03/1997 09:04:00 3046.00 3048.50 3046.00 3047.50 505          
02/03/1997 09:05:00 3047.00 3048.00 3046.00 3047.00 162          
02/03/1997 09:06:00 3047.50 3048.00 3047.00 3047.50 98           
02/03/1997 09:07:00 3047.50 3047.50 3047.00 3047.50 228          
02/03/1997 09:08:00 3048.00 3048.00 3047.50 3048.00 136          
02/03/1997 09:09:00 3048.00 3048.00 3046.50 3046.50 174          
02/03/1997 09:10:00 3046.50 3046.50 3045.00 3045.00 134          
02/03/1997 09:11:00 3045.50 3046.00 3044.00 3045.00 43           
02/03/1997 09:12:00 3045.00 3045.50 3045.00 3045.00 214          
02/03/1997 09:13:00 3045.50 3045.50 3045.50 3045.50 8            
02/03/1997 09:14:00 3045.50 3046.00 3044.50 3044.50 152

Answer 1

You can use df.resample to do aggregation based on a date/time variable. You'll need a datetime index and you can specify that while reading the csv file:

df = pd.read_csv("filename.csv", parse_dates = [["DATE", "TIME"]], index_col=0)

This will result in a dataframe with an index where date and time are combined ( source ):

df.head()
Out[7]: 
                       OPEN    HIGH     LOW   CLOSE  VOLUME 
DATE_TIME                                                   
1997-02-03 09:04:00  3046.0  3048.5  3046.0  3047.5      505
1997-02-03 09:05:00  3047.0  3048.0  3046.0  3047.0      162
1997-02-03 09:06:00  3047.5  3048.0  3047.0  3047.5       98
1997-02-03 09:07:00  3047.5  3047.5  3047.0  3047.5      228
1997-02-03 09:08:00  3048.0  3048.0  3047.5  3048.0      136

After that you can use resample to get the sum, mean, etc. of those five minute intervals.

df.resample("5T").mean()
Out[8]: 
                       OPEN    HIGH     LOW   CLOSE  VOLUME 
DATE_TIME                                                   
1997-02-03 09:00:00  3046.0  3048.5  3046.0  3047.5    505.0
1997-02-03 09:05:00  3047.6  3047.9  3046.8  3047.3    159.6
1997-02-03 09:10:00  3045.6  3045.9  3044.8  3045.0    110.2
1997-02-03 09:15:00  3043.6  3044.0  3042.8  3043.2     69.2
1997-02-03 09:20:00  3044.7  3045.2  3044.5  3045.0     65.8
1997-02-03 09:25:00  3043.8  3044.0  3043.5  3043.7     59.0
1997-02-03 09:30:00  3044.6  3045.0  3044.3  3044.6     56.0
1997-02-03 09:35:00  3044.5  3044.5  3043.5  3044.5     44.0

( T is used for minute frequency. Here is a list of other units.)

Answer 2

Another way using pandas is to use its TimeGrouper -function. Its purpose is just meant for use cases like yours.

import pandas as pd

df = pd.DataFrame("Your data provided above")
df["DATE"] = pd.to_datetime(df["DATE"])
df.set_index("DATE", inplace=True)

df = df.groupby(pd.TimeGrouper('5Min')).agg({
                                        "OPEN":  "first",
                                        "HIGH":  "max",
                                        "LOW":   "min",
                                        "CLOSE": "last",
                                        "VOLUME": "sum"
                                    })

The provided script uses an aggregation you might have in mind since you're dealing with stock-data. It aggregates in a way that you will end up with the 5-min candles resulting from your 1-min candles.

Answer 3

slight modification to Markus answer. It groups and assign it to last index

df_close_left = data_set.groupby(pd.Grouper(freq='5Min',closed='right',label='right')).agg({
                                        "open":  "first",
                                        "high":  "max",
                                        "low":   "min",
                                        "close": "last",
                                        "volume": "sum"

                                    })

Group DataFrame in 5-minute intervals

Question

3 answers

solution1
12 ACCPTED 2016-04-18 01:34:50

solution2
2 2020-06-20 00:22:42

solution3
2 2020-07-01 18:12:59

Group DataFrame in 5-minute intervals

Question

3 answers

solution1 12 ACCPTED 2016-04-18 01:34:50

solution2 2 2020-06-20 00:22:42

solution3 2 2020-07-01 18:12:59

solution1
12 ACCPTED 2016-04-18 01:34:50

solution2
2 2020-06-20 00:22:42

solution3
2 2020-07-01 18:12:59