简体   繁体   中英

pandas DataFrame create 15minute overlapping intervals

I have a pandas DataFrame (df), with the following columns: ts_unix, val1, val2 每分钟的 DataFrame 数据

I want to add a new column called "15min_interval", each interval is a 15 minute window starting at every minute. All the rows within an interval will have the same interval in their interval column (ie the first 15 rows have the same interval values):

在此处输入图像描述

I have tried the brute-force method of looping through the 15min_interval values and slicing the df for Ts_unix between each interval, concatenating all the DataFrames and create the df_15min. This takes too long to process.

I also tried creating a date_time column and using floor(15min) but this method creates non-overlapping 15min windows and rounds the starting value to the first 15 minute in the hour. Doesn't work!

I want a faster method of creating overlapping 15-min intervals (overlap at every minute)

Probably not the cleanest solution, but:

In [1]: mins = pd.date_range(start='2019-06-29 07:10', end='2019-10-26 00:00', freq='min')
        unix_list = [int(ts.timestamp()) for ts in mins]
        df = pd.DataFrame({'ts_unix': unix_list, 'val1': np.random.random(len(unix_list)),
               'val2': np.random.random(len(unix_list))})
        df['ts_unix'] = pd.to_datetime(df['ts_unix'], unit='s')

        series_15mins = df.set_index('ts_unix', drop=True).resample('15min', loffset=datetime.timedelta(minutes=10)).sum().reset_index().ts_unix
        intervals = list()
        for j in series_15mins.index:
            if j > 0:
                intervals.append(15*[(int(series_15mins.loc[j-1].timestamp()), int(series_15mins.loc[j].timestamp()))])

        intervals = np.array(intervals).reshape(15*len(intervals), 2)
        intervals = intervals[:df.shape[0], :]
        df['15min_interval'] = list(intervals)
        df['ts_unix'] = df['ts_unix'].astype(np.int64)//10**9

Which results in:

In [2]: df.head(20)
Out[2]:     ts_unix     val1        val2        15min_interval
        0   1561792200  0.497049    0.296606    [1561792200, 1561793100]
        1   1561792260  0.789830    0.132583    [1561792200, 1561793100]
        2   1561792320  0.152093    0.869951    [1561792200, 1561793100]
        3   1561792380  0.631848    0.012687    [1561792200, 1561793100]
        4   1561792440  0.363599    0.685802    [1561792200, 1561793100]
        5   1561792500  0.678252    0.988140    [1561792200, 1561793100]
        6   1561792560  0.627432    0.502722    [1561792200, 1561793100]
        7   1561792620  0.860156    0.414428    [1561792200, 1561793100]
        8   1561792680  0.342857    0.686593    [1561792200, 1561793100]
        9   1561792740  0.004300    0.345949    [1561792200, 1561793100]
        10  1561792800  0.359219    0.178324    [1561792200, 1561793100]
        11  1561792860  0.818282    0.673142    [1561792200, 1561793100]
        12  1561792920  0.396736    0.642892    [1561792200, 1561793100]
        13  1561792980  0.022025    0.901829    [1561792200, 1561793100]
        14  1561793040  0.185680    0.158434    [1561792200, 1561793100]
        15  1561793100  0.813750    0.941224    [1561793100, 1561794000]
        16  1561793160  0.706645    0.504383    [1561793100, 1561794000]
        17  1561793220  0.844269    0.644725    [1561793100, 1561794000]
        18  1561793280  0.604586    0.043472    [1561793100, 1561794000]
        19  1561793340  0.174518    0.577738    [1561793100, 1561794000]

Edit: Fifteen minute intervals starting every hour:

In [1]: mins = pd.date_range(start='2019-06-29 07:10', end='2019-10-26 00:00', freq='min')
        unix_list = [int(ts.timestamp()) for ts in mins]
        df = pd.DataFrame({'ts_unix': unix_list, 'val1': np.random.random(len(unix_list)), 'val2': np.random.random(len(unix_list))})
        df['15min_interval'] = [*zip(df.ts_unix, df.ts_unix+900)]
Out[1]:        ts_unix      val1        val2              15min_interval
        0   1561792200  0.945755    0.334230    (1561792200, 1561793100)
        1   1561792260  0.044156    0.851238    (1561792260, 1561793160)
        2   1561792320  0.924516    0.276829    (1561792320, 1561793220)
        3   1561792380  0.383580    0.237742    (1561792380, 1561793280)
        4   1561792440  0.782808    0.808183    (1561792440, 1561793340)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM