简体   繁体   English

如何从不规则的时间序列数据中创建 2 秒宽度的滑动 window

[英]How to create sliding window of 2 seconds width from irregular timeseries data

I have a set of timeseries data across a few different days.我有一组不同日子的时间序列数据。 The data looks as follows.数据如下所示。 I would like to separate all the data into 2 seconds intervals and create a sliding window and then label each window with a label of "stay" or "leave". I would like to separate all the data into 2 seconds intervals and create a sliding window and then label each window with a label of "stay" or "leave". I tried using the pandas built-in windows but the window only lets me choose to have window size of integers (records in the dataframe), not time of 2 seconds etc. I tried using the pandas built-in windows but the window only lets me choose to have window size of integers (records in the dataframe), not time of 2 seconds etc.

Is making a sliding window necessary for this task?是否需要为此任务制作滑动 window ? I looked online on how to do machine learning on time series data and it was mentioned that using windows is one of the basics of working with time series data.我在网上查看了如何对时间序列数据进行机器学习,并提到使用 windows 是处理时间序列数据的基础之一。

Currently, I am thinking to generate all the 2 second intervals, and replicate each record (each record lasts from timestamp to timestamp of next record) according to the relevant timestamp in the original DataFrame to create a new DataFrame with records of 2 second intervals of the time series.目前,我正在考虑生成所有 2 秒间隔,并根据原始 DataFrame 中的相关时间戳复制每条记录(每条记录从时间戳持续到下一条记录的时间戳),以创建一个新的 DataFrame 记录,间隔为 2 秒时间序列。

    leave   timeframe   confidence  restaurant  timestamp   event
0   false   2021-12-17T12:06:19+0800 to 2021-12-17T12:30:2...   2   Bistro NTT  2021-12-17 12:05:19+08:00   walking
1   false   2021-12-17T12:06:19+0800 to 2021-12-17T12:30:2...   2   Bistro NTT  2021-12-17 12:06:07+08:00   Previous activity ended. Recalculating activit...
2   false   2021-12-17T12:06:19+0800 to 2021-12-17T12:30:2...   2   Bistro NTT  2021-12-17 12:07:04+08:00   stationary
3   false   2021-12-17T12:06:19+0800 to 2021-12-17T12:30:2...   2   Bistro NTT  2021-12-17 12:08:35+08:00   Previous activity ended. Recalculating activit...
4   false   2021-12-17T12:06:19+0800 to 2021-12-17T12:30:2...   2   Bistro NTT  2021-12-17 12:08:47+08:00   stationary

as of now, I managed to create a dummy dataframe, all I need to do now is to fit my old dataframe onto the new dataframe.到目前为止,我设法创建了一个虚拟 dataframe,我现在需要做的就是将我的旧 dataframe 安装到新的 dataframe 上。 Basically, fit the graph into this new 2 second interval dataframe基本上,将图表拟合到这个新的 2 秒间隔 dataframe

rs = pd.date_range(start=timeseries.index[0], end=timeseries.index[-1], freq='2s') #index=timeseries.resample('2s').interpolate().iloc[1:].index
dummy_frame = pd.DataFrame(np.NaN, index=rs, columns=timeseries.columns)
dummy_frame.head()

output: output:

    leave   timeframe   confidence  restaurant  event
2021-12-17 12:05:19+08:00   NaN NaN NaN NaN NaN
2021-12-17 12:05:21+08:00   NaN NaN NaN NaN NaN
2021-12-17 12:05:23+08:00   NaN NaN NaN NaN NaN
2021-12-17 12:05:25+08:00   NaN NaN NaN NaN NaN
2021-12-17 12:05:27+08:00   NaN NaN NaN NaN NaN

绘制的样本数据

What you expect is not clear, can you test:您的期望不清楚,您可以测试一下:

out = df.groupby(pd.Grouper(freq='2s', key='timestamp'))['leave'] \
        .apply(lambda x: 'leave' if any(x == 'true') else 'stay')
print(out.head(10))

# Output:
timestamp
2021-12-22 11:35:00    leave
2021-12-22 11:35:02    leave
2021-12-22 11:35:04    leave
2021-12-22 11:35:06     stay
2021-12-22 11:35:08    leave
2021-12-22 11:35:10    leave
2021-12-22 11:35:12    leave
2021-12-22 11:35:14     stay
2021-12-22 11:35:16     stay
2021-12-22 11:35:18     stay
Freq: 2S, Name: leave, dtype: object

Setup:设置:

np.random.seed(2021)
dti = pd.date_range('2021-12-22 11:35', freq='1s', periods=60)
df = pd.DataFrame({'leave': np.random.choice(['false', 'true'], len(dti)),
                   'timestamp': dti, 'event': [2]*len(dti)})
print(df.head(10))

# Output:
   leave           timestamp  event
0  false 2021-12-22 11:35:00      2
1   true 2021-12-22 11:35:01      2
2   true 2021-12-22 11:35:02      2
3  false 2021-12-22 11:35:03      2
4   true 2021-12-22 11:35:04      2
5  false 2021-12-22 11:35:05      2
6  false 2021-12-22 11:35:06      2
7  false 2021-12-22 11:35:07      2
8   true 2021-12-22 11:35:08      2
9  false 2021-12-22 11:35:09      2

Here's what I have done so far:这是我到目前为止所做的:

def normalize_ts(df, limit=3):
    unique_val = df.timeframe.unique()
    index = 0
    df_list = list()
    for val in unique_val:
        if index > limit:
            break
        df_filtered = df[df['timeframe']==val]
        df_list.append(normalize(df_filtered))
        index += 1
        
    return df_list
        
def normalize(df):
    #generate dummy dataframes to store values of 2s intervals
    rs = pd.date_range(start=df.index[0], end=df.index[-1], freq='2s') 
    #index=timeseries.resample('2s').interpolate().iloc[1:].index
    dummy_frame = pd.DataFrame(np.NaN, index=rs, columns=df.columns)
    
    idx = 0
    
    #fill in values to dummy dataframe
    for val in range(len(df.index)):
        if df.index[val] == df.index[-1]:
            if idx == len(dummy_frame.index):
                break
            dummy_index = dummy_frame.index[idx]
            df_index = df.index[val]

            dummy_frame.loc[dummy_index, 'leave'] = df.loc[df_index, 'leave']
            dummy_frame.loc[dummy_index, 'timeframe'] = df.loc[df_index, 'timeframe']
            dummy_frame.loc[dummy_index, 'confidence'] = df.loc[df_index, 'confidence']
            dummy_frame.loc[dummy_index, 'restaurant'] = df.loc[df_index, 'restaurant']
            dummy_frame.loc[dummy_index, 'event'] = df.loc[df_index, 'event']
            break
        while(True):
            if idx == len(dummy_frame.index):
                break
            dummy_index = dummy_frame.index[idx]
            df_index = df.index[val]
            df_index2 = df.index[val+1]

            if dummy_index >= df_index and df_index2 > dummy_index:
                dummy_frame.loc[dummy_index, 'leave'] = df.loc[df_index, 'leave']
                dummy_frame.loc[dummy_index, 'timeframe'] = df.loc[df_index, 'timeframe']
                dummy_frame.loc[dummy_index, 'confidence'] = df.loc[df_index, 'confidence']
                dummy_frame.loc[dummy_index, 'restaurant'] = df.loc[df_index, 'restaurant']
                dummy_frame.loc[dummy_index, 'event'] = df.loc[df_index, 'event']
            else:
                break
            idx += 1
            
    return dummy_frame

timeseries = timeseries.set_index('timestamp')
dataframes = normalize_ts(timeseries, 10)
merged = pd.concat(dataframes).reset_index()
merged.rename(columns={'index': 'timestamp'}, inplace=True)

This gives me an output like the following:这给了我一个 output ,如下所示:

    timestamp   leave   timeframe   confidence  restaurant  event
0   2021-12-17 12:05:19+08:00   false   2021-12-17T12:06:19+0800 to 2021-12-17T12:30:2...   2.0 Bistro NTT  walking
1   2021-12-17 12:05:21+08:00   false   2021-12-17T12:06:19+0800 to 2021-12-17T12:30:2...   2.0 Bistro NTT  walking
2   2021-12-17 12:05:23+08:00   false   2021-12-17T12:06:19+0800 to 2021-12-17T12:30:2...   2.0 Bistro NTT  walking
3   2021-12-17 12:05:25+08:00   false   2021-12-17T12:06:19+0800 to 2021-12-17T12:30:2...   2.0 Bistro NTT  walking
4   2021-12-17 12:05:27+08:00   false   2021-12-17T12:06:19+0800 to 2021-12-17T12:30:2...   2.0 Bistro NTT  walking

This may not be the most efficient, but now that I am able to get the values of the irregular timeseries data into a regular interval, I think I should be able to start working on the windowing function for time series machine learning thing这可能不是最有效的,但是现在我能够将不规则时间序列数据的值转换为规则间隔,我想我应该能够开始处理时间序列机器学习的窗口 function

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何对齐滑动 window 以从多模态时间序列数据中提取特征? - how to align sliding window to extract features from multi modal timeseries data? 如何创建时间序列滑动 window tensorflow 数据集,其中某些特征的批量大小与其他特征不同? - How do I create a timeseries sliding window tensorflow dataset where some features have different batch sizes than others? 从熊猫数据帧创建输入数据的滑动窗口 - create a sliding window of input data from pandas dataframe 为pandas中的不规则时间序列创建加权平均值 - Create a weighted mean for a irregular timeseries in pandas 分别使用 timeseries_dataset_from_array 和 TimeseriesGenerator 对齐 tensorflow/keras 的批量滑动帧时间序列数据 - Aligning batched sliding frame timeseries data for tensorflow/keras using timeseries_dataset_from_array and TimeseriesGenerator respectively 创建数据索引位置的滑动 window - Create a sliding window of data index positions pandas DataFrame从不规则时间序列索引重新采样 - pandas DataFrame resample from irregular timeseries index 如何创建一个滑动窗口来合并不同的条目? - How to create a sliding window for merging different entries? 如何创建滑动窗口生成器python 3.3 - How to create a sliding window generator python 3.3 Python:可变宽度的滑动窗口 - Python: sliding window of variable width
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM