pandas DataFrame create 15minute overlapping intervals

Question

I have a pandas DataFrame (df), with the following columns: ts_unix, val1, val2

I want to add a new column called "15min_interval", each interval is a 15 minute window starting at every minute. All the rows within an interval will have the same interval in their interval column (ie the first 15 rows have the same interval values):

I have tried the brute-force method of looping through the 15min_interval values and slicing the df for Ts_unix between each interval, concatenating all the DataFrames and create the df_15min. This takes too long to process.

I also tried creating a date_time column and using floor(15min) but this method creates non-overlapping 15min windows and rounds the starting value to the first 15 minute in the hour. Doesn't work!

I want a faster method of creating overlapping 15-min intervals (overlap at every minute)

Answer 1

Probably not the cleanest solution, but:

In [1]: mins = pd.date_range(start='2019-06-29 07:10', end='2019-10-26 00:00', freq='min')
        unix_list = [int(ts.timestamp()) for ts in mins]
        df = pd.DataFrame({'ts_unix': unix_list, 'val1': np.random.random(len(unix_list)),
               'val2': np.random.random(len(unix_list))})
        df['ts_unix'] = pd.to_datetime(df['ts_unix'], unit='s')

        series_15mins = df.set_index('ts_unix', drop=True).resample('15min', loffset=datetime.timedelta(minutes=10)).sum().reset_index().ts_unix
        intervals = list()
        for j in series_15mins.index:
            if j > 0:
                intervals.append(15*[(int(series_15mins.loc[j-1].timestamp()), int(series_15mins.loc[j].timestamp()))])

        intervals = np.array(intervals).reshape(15*len(intervals), 2)
        intervals = intervals[:df.shape[0], :]
        df['15min_interval'] = list(intervals)
        df['ts_unix'] = df['ts_unix'].astype(np.int64)//10**9

Which results in:

In [2]: df.head(20)
Out[2]:     ts_unix     val1        val2        15min_interval
        0   1561792200  0.497049    0.296606    [1561792200, 1561793100]
        1   1561792260  0.789830    0.132583    [1561792200, 1561793100]
        2   1561792320  0.152093    0.869951    [1561792200, 1561793100]
        3   1561792380  0.631848    0.012687    [1561792200, 1561793100]
        4   1561792440  0.363599    0.685802    [1561792200, 1561793100]
        5   1561792500  0.678252    0.988140    [1561792200, 1561793100]
        6   1561792560  0.627432    0.502722    [1561792200, 1561793100]
        7   1561792620  0.860156    0.414428    [1561792200, 1561793100]
        8   1561792680  0.342857    0.686593    [1561792200, 1561793100]
        9   1561792740  0.004300    0.345949    [1561792200, 1561793100]
        10  1561792800  0.359219    0.178324    [1561792200, 1561793100]
        11  1561792860  0.818282    0.673142    [1561792200, 1561793100]
        12  1561792920  0.396736    0.642892    [1561792200, 1561793100]
        13  1561792980  0.022025    0.901829    [1561792200, 1561793100]
        14  1561793040  0.185680    0.158434    [1561792200, 1561793100]
        15  1561793100  0.813750    0.941224    [1561793100, 1561794000]
        16  1561793160  0.706645    0.504383    [1561793100, 1561794000]
        17  1561793220  0.844269    0.644725    [1561793100, 1561794000]
        18  1561793280  0.604586    0.043472    [1561793100, 1561794000]
        19  1561793340  0.174518    0.577738    [1561793100, 1561794000]

Edit: Fifteen minute intervals starting every hour:

In [1]: mins = pd.date_range(start='2019-06-29 07:10', end='2019-10-26 00:00', freq='min')
        unix_list = [int(ts.timestamp()) for ts in mins]
        df = pd.DataFrame({'ts_unix': unix_list, 'val1': np.random.random(len(unix_list)), 'val2': np.random.random(len(unix_list))})
        df['15min_interval'] = [*zip(df.ts_unix, df.ts_unix+900)]
Out[1]:        ts_unix      val1        val2              15min_interval
        0   1561792200  0.945755    0.334230    (1561792200, 1561793100)
        1   1561792260  0.044156    0.851238    (1561792260, 1561793160)
        2   1561792320  0.924516    0.276829    (1561792320, 1561793220)
        3   1561792380  0.383580    0.237742    (1561792380, 1561793280)
        4   1561792440  0.782808    0.808183    (1561792440, 1561793340)

pandas DataFrame create 15minute overlapping intervals

Question

1 answers

solution1
0 2019-10-26 20:26:38

pandas DataFrame create 15minute overlapping intervals

Question

1 answers

solution1 0 2019-10-26 20:26:38

solution1
0 2019-10-26 20:26:38