[英]Using a sliding window to generate a single dataframe using pandas
I want to apply a sliding window of size 3 on the below dataframe and return a new dataframe containing the windowed data.我想在下面的 dataframe 上应用大小为 3 的滑动 window 并返回包含窗口数据的新 dataframe。
datetime,val
2008-11-01 00:00:00,14
2008-11-01 00:00:01,11
2008-11-01 00:00:02,22
2008-11-01 00:00:09,56
2008-11-01 00:00:10,32
2008-11-01 00:00:12,11
2008-11-01 00:00:13,95
2008-11-01 00:00:15,77
2008-11-01 00:00:16,49
2008-11-01 00:00:17,66
My desired output is as below:我想要的 output如下:
datetime val
2008-11-01 00:00:00 14
2008-11-01 00:00:01 11
2008-11-01 00:00:02 22
2008-11-01 00:00:01 11
2008-11-01 00:00:02 22
2008-11-01 00:00:09 56
2008-11-01 00:00:02 22
2008-11-01 00:00:09 56
2008-11-01 00:00:10 32
2008-11-01 00:00:09 56
2008-11-01 00:00:10 32
2008-11-01 00:00:12 11
2008-11-01 00:00:10 32
2008-11-01 00:00:12 11
2008-11-01 00:00:13 95
2008-11-01 00:00:12 11
2008-11-01 00:00:13 95
2008-11-01 00:00:15 77
2008-11-01 00:00:13 95
2008-11-01 00:00:15 77
2008-11-01 00:00:16 49
2008-11-01 00:00:15 77
2008-11-01 00:00:16 49
2008-11-01 00:00:17 66
I have tried the below which generates the desired windows (as below) but the dataframe is not in the desired format as it returns the columns for each window. How can I convert the current output to the desired single dataframe (ie, one with only single row of columns at the start).我已经尝试了下面生成所需的 windows(如下所示)但 dataframe 不是所需的格式,因为它返回每个 window 的列。如何将当前的 output 转换为所需的单个 dataframe(即,只有一个开头的单行列)。
import pandas as pd
import numpy as np
def df_sliding_windows(data, window=0):
for i in range(0, len(df) - window+1):
yield df.iloc[i : i + window]
if __name__ == '__main__':
df = pd.read_csv('sample.csv')
df['datetime'] = pd.to_datetime(df['datetime'])
df_slide_windows = df_sliding_windows(df, 3)
for j in df_slide_windows:
print(j)
datetime val
0 2008-11-01 00:00:00 14
1 2008-11-01 00:00:01 11
2 2008-11-01 00:00:02 22
datetime val
1 2008-11-01 00:00:01 11
2 2008-11-01 00:00:02 22
3 2008-11-01 00:00:09 56
datetime val
2 2008-11-01 00:00:02 22
3 2008-11-01 00:00:09 56
4 2008-11-01 00:00:10 32
datetime val
3 2008-11-01 00:00:09 56
4 2008-11-01 00:00:10 32
5 2008-11-01 00:00:12 11
datetime val
4 2008-11-01 00:00:10 32
5 2008-11-01 00:00:12 11
6 2008-11-01 00:00:13 95
datetime val
5 2008-11-01 00:00:12 11
6 2008-11-01 00:00:13 95
7 2008-11-01 00:00:15 77
datetime val
6 2008-11-01 00:00:13 95
7 2008-11-01 00:00:15 77
8 2008-11-01 00:00:16 49
datetime val
7 2008-11-01 00:00:15 77
8 2008-11-01 00:00:16 49
9 2008-11-01 00:00:17 66
w = 3
inds = [r + i for r in range(len(df) - w + 1) for i in range(w)]
df.iloc[inds]
Change your code更改代码
if __name__ == '__main__':
df = pd.read_csv('sample.csv')
df['datetime'] = pd.to_datetime(df['datetime'])
df = pd.concat(df_sliding_windows(df, 3))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.