[英]How to create sliding window of 2 seconds width from irregular timeseries data
I have a set of timeseries data across a few different days.我有一组不同日子的时间序列数据。 The data looks as follows.
数据如下所示。 I would like to separate all the data into 2 seconds intervals and create a sliding window and then label each window with a label of "stay" or "leave".
I would like to separate all the data into 2 seconds intervals and create a sliding window and then label each window with a label of "stay" or "leave". I tried using the pandas built-in windows but the window only lets me choose to have window size of integers (records in the dataframe), not time of 2 seconds etc.
I tried using the pandas built-in windows but the window only lets me choose to have window size of integers (records in the dataframe), not time of 2 seconds etc.
Is making a sliding window necessary for this task?是否需要为此任务制作滑动 window ? I looked online on how to do machine learning on time series data and it was mentioned that using windows is one of the basics of working with time series data.
我在网上查看了如何对时间序列数据进行机器学习,并提到使用 windows 是处理时间序列数据的基础之一。
Currently, I am thinking to generate all the 2 second intervals, and replicate each record (each record lasts from timestamp to timestamp of next record) according to the relevant timestamp in the original DataFrame to create a new DataFrame with records of 2 second intervals of the time series.目前,我正在考虑生成所有 2 秒间隔,并根据原始 DataFrame 中的相关时间戳复制每条记录(每条记录从时间戳持续到下一条记录的时间戳),以创建一个新的 DataFrame 记录,间隔为 2 秒时间序列。
leave timeframe confidence restaurant timestamp event
0 false 2021-12-17T12:06:19+0800 to 2021-12-17T12:30:2... 2 Bistro NTT 2021-12-17 12:05:19+08:00 walking
1 false 2021-12-17T12:06:19+0800 to 2021-12-17T12:30:2... 2 Bistro NTT 2021-12-17 12:06:07+08:00 Previous activity ended. Recalculating activit...
2 false 2021-12-17T12:06:19+0800 to 2021-12-17T12:30:2... 2 Bistro NTT 2021-12-17 12:07:04+08:00 stationary
3 false 2021-12-17T12:06:19+0800 to 2021-12-17T12:30:2... 2 Bistro NTT 2021-12-17 12:08:35+08:00 Previous activity ended. Recalculating activit...
4 false 2021-12-17T12:06:19+0800 to 2021-12-17T12:30:2... 2 Bistro NTT 2021-12-17 12:08:47+08:00 stationary
as of now, I managed to create a dummy dataframe, all I need to do now is to fit my old dataframe onto the new dataframe.到目前为止,我设法创建了一个虚拟 dataframe,我现在需要做的就是将我的旧 dataframe 安装到新的 dataframe 上。 Basically, fit the graph into this new 2 second interval dataframe
基本上,将图表拟合到这个新的 2 秒间隔 dataframe
rs = pd.date_range(start=timeseries.index[0], end=timeseries.index[-1], freq='2s') #index=timeseries.resample('2s').interpolate().iloc[1:].index
dummy_frame = pd.DataFrame(np.NaN, index=rs, columns=timeseries.columns)
dummy_frame.head()
output: output:
leave timeframe confidence restaurant event
2021-12-17 12:05:19+08:00 NaN NaN NaN NaN NaN
2021-12-17 12:05:21+08:00 NaN NaN NaN NaN NaN
2021-12-17 12:05:23+08:00 NaN NaN NaN NaN NaN
2021-12-17 12:05:25+08:00 NaN NaN NaN NaN NaN
2021-12-17 12:05:27+08:00 NaN NaN NaN NaN NaN
What you expect is not clear, can you test:您的期望不清楚,您可以测试一下:
out = df.groupby(pd.Grouper(freq='2s', key='timestamp'))['leave'] \
.apply(lambda x: 'leave' if any(x == 'true') else 'stay')
print(out.head(10))
# Output:
timestamp
2021-12-22 11:35:00 leave
2021-12-22 11:35:02 leave
2021-12-22 11:35:04 leave
2021-12-22 11:35:06 stay
2021-12-22 11:35:08 leave
2021-12-22 11:35:10 leave
2021-12-22 11:35:12 leave
2021-12-22 11:35:14 stay
2021-12-22 11:35:16 stay
2021-12-22 11:35:18 stay
Freq: 2S, Name: leave, dtype: object
Setup:设置:
np.random.seed(2021)
dti = pd.date_range('2021-12-22 11:35', freq='1s', periods=60)
df = pd.DataFrame({'leave': np.random.choice(['false', 'true'], len(dti)),
'timestamp': dti, 'event': [2]*len(dti)})
print(df.head(10))
# Output:
leave timestamp event
0 false 2021-12-22 11:35:00 2
1 true 2021-12-22 11:35:01 2
2 true 2021-12-22 11:35:02 2
3 false 2021-12-22 11:35:03 2
4 true 2021-12-22 11:35:04 2
5 false 2021-12-22 11:35:05 2
6 false 2021-12-22 11:35:06 2
7 false 2021-12-22 11:35:07 2
8 true 2021-12-22 11:35:08 2
9 false 2021-12-22 11:35:09 2
Here's what I have done so far:这是我到目前为止所做的:
def normalize_ts(df, limit=3):
unique_val = df.timeframe.unique()
index = 0
df_list = list()
for val in unique_val:
if index > limit:
break
df_filtered = df[df['timeframe']==val]
df_list.append(normalize(df_filtered))
index += 1
return df_list
def normalize(df):
#generate dummy dataframes to store values of 2s intervals
rs = pd.date_range(start=df.index[0], end=df.index[-1], freq='2s')
#index=timeseries.resample('2s').interpolate().iloc[1:].index
dummy_frame = pd.DataFrame(np.NaN, index=rs, columns=df.columns)
idx = 0
#fill in values to dummy dataframe
for val in range(len(df.index)):
if df.index[val] == df.index[-1]:
if idx == len(dummy_frame.index):
break
dummy_index = dummy_frame.index[idx]
df_index = df.index[val]
dummy_frame.loc[dummy_index, 'leave'] = df.loc[df_index, 'leave']
dummy_frame.loc[dummy_index, 'timeframe'] = df.loc[df_index, 'timeframe']
dummy_frame.loc[dummy_index, 'confidence'] = df.loc[df_index, 'confidence']
dummy_frame.loc[dummy_index, 'restaurant'] = df.loc[df_index, 'restaurant']
dummy_frame.loc[dummy_index, 'event'] = df.loc[df_index, 'event']
break
while(True):
if idx == len(dummy_frame.index):
break
dummy_index = dummy_frame.index[idx]
df_index = df.index[val]
df_index2 = df.index[val+1]
if dummy_index >= df_index and df_index2 > dummy_index:
dummy_frame.loc[dummy_index, 'leave'] = df.loc[df_index, 'leave']
dummy_frame.loc[dummy_index, 'timeframe'] = df.loc[df_index, 'timeframe']
dummy_frame.loc[dummy_index, 'confidence'] = df.loc[df_index, 'confidence']
dummy_frame.loc[dummy_index, 'restaurant'] = df.loc[df_index, 'restaurant']
dummy_frame.loc[dummy_index, 'event'] = df.loc[df_index, 'event']
else:
break
idx += 1
return dummy_frame
timeseries = timeseries.set_index('timestamp')
dataframes = normalize_ts(timeseries, 10)
merged = pd.concat(dataframes).reset_index()
merged.rename(columns={'index': 'timestamp'}, inplace=True)
This gives me an output like the following:这给了我一个 output ,如下所示:
timestamp leave timeframe confidence restaurant event
0 2021-12-17 12:05:19+08:00 false 2021-12-17T12:06:19+0800 to 2021-12-17T12:30:2... 2.0 Bistro NTT walking
1 2021-12-17 12:05:21+08:00 false 2021-12-17T12:06:19+0800 to 2021-12-17T12:30:2... 2.0 Bistro NTT walking
2 2021-12-17 12:05:23+08:00 false 2021-12-17T12:06:19+0800 to 2021-12-17T12:30:2... 2.0 Bistro NTT walking
3 2021-12-17 12:05:25+08:00 false 2021-12-17T12:06:19+0800 to 2021-12-17T12:30:2... 2.0 Bistro NTT walking
4 2021-12-17 12:05:27+08:00 false 2021-12-17T12:06:19+0800 to 2021-12-17T12:30:2... 2.0 Bistro NTT walking
This may not be the most efficient, but now that I am able to get the values of the irregular timeseries data into a regular interval, I think I should be able to start working on the windowing function for time series machine learning thing这可能不是最有效的,但是现在我能够将不规则时间序列数据的值转换为规则间隔,我想我应该能够开始处理时间序列机器学习的窗口 function
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.