I have a pandas DataFrame (df), with the following columns: ts_unix, val1, val2
I want to add a new column called "15min_interval", each interval is a 15 minute window starting at every minute. All the rows within an interval will have the same interval in their interval column (ie the first 15 rows have the same interval values):
I have tried the brute-force method of looping through the 15min_interval values and slicing the df for Ts_unix between each interval, concatenating all the DataFrames and create the df_15min. This takes too long to process.
I also tried creating a date_time column and using floor(15min) but this method creates non-overlapping 15min windows and rounds the starting value to the first 15 minute in the hour. Doesn't work!
I want a faster method of creating overlapping 15-min intervals (overlap at every minute)
Probably not the cleanest solution, but:
In [1]: mins = pd.date_range(start='2019-06-29 07:10', end='2019-10-26 00:00', freq='min')
unix_list = [int(ts.timestamp()) for ts in mins]
df = pd.DataFrame({'ts_unix': unix_list, 'val1': np.random.random(len(unix_list)),
'val2': np.random.random(len(unix_list))})
df['ts_unix'] = pd.to_datetime(df['ts_unix'], unit='s')
series_15mins = df.set_index('ts_unix', drop=True).resample('15min', loffset=datetime.timedelta(minutes=10)).sum().reset_index().ts_unix
intervals = list()
for j in series_15mins.index:
if j > 0:
intervals.append(15*[(int(series_15mins.loc[j-1].timestamp()), int(series_15mins.loc[j].timestamp()))])
intervals = np.array(intervals).reshape(15*len(intervals), 2)
intervals = intervals[:df.shape[0], :]
df['15min_interval'] = list(intervals)
df['ts_unix'] = df['ts_unix'].astype(np.int64)//10**9
Which results in:
In [2]: df.head(20)
Out[2]: ts_unix val1 val2 15min_interval
0 1561792200 0.497049 0.296606 [1561792200, 1561793100]
1 1561792260 0.789830 0.132583 [1561792200, 1561793100]
2 1561792320 0.152093 0.869951 [1561792200, 1561793100]
3 1561792380 0.631848 0.012687 [1561792200, 1561793100]
4 1561792440 0.363599 0.685802 [1561792200, 1561793100]
5 1561792500 0.678252 0.988140 [1561792200, 1561793100]
6 1561792560 0.627432 0.502722 [1561792200, 1561793100]
7 1561792620 0.860156 0.414428 [1561792200, 1561793100]
8 1561792680 0.342857 0.686593 [1561792200, 1561793100]
9 1561792740 0.004300 0.345949 [1561792200, 1561793100]
10 1561792800 0.359219 0.178324 [1561792200, 1561793100]
11 1561792860 0.818282 0.673142 [1561792200, 1561793100]
12 1561792920 0.396736 0.642892 [1561792200, 1561793100]
13 1561792980 0.022025 0.901829 [1561792200, 1561793100]
14 1561793040 0.185680 0.158434 [1561792200, 1561793100]
15 1561793100 0.813750 0.941224 [1561793100, 1561794000]
16 1561793160 0.706645 0.504383 [1561793100, 1561794000]
17 1561793220 0.844269 0.644725 [1561793100, 1561794000]
18 1561793280 0.604586 0.043472 [1561793100, 1561794000]
19 1561793340 0.174518 0.577738 [1561793100, 1561794000]
Edit: Fifteen minute intervals starting every hour:
In [1]: mins = pd.date_range(start='2019-06-29 07:10', end='2019-10-26 00:00', freq='min')
unix_list = [int(ts.timestamp()) for ts in mins]
df = pd.DataFrame({'ts_unix': unix_list, 'val1': np.random.random(len(unix_list)), 'val2': np.random.random(len(unix_list))})
df['15min_interval'] = [*zip(df.ts_unix, df.ts_unix+900)]
Out[1]: ts_unix val1 val2 15min_interval
0 1561792200 0.945755 0.334230 (1561792200, 1561793100)
1 1561792260 0.044156 0.851238 (1561792260, 1561793160)
2 1561792320 0.924516 0.276829 (1561792320, 1561793220)
3 1561792380 0.383580 0.237742 (1561792380, 1561793280)
4 1561792440 0.782808 0.808183 (1561792440, 1561793340)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.