简体   繁体   English

使用滑动 window 生成单个 dataframe 使用 pandas

[英]Using a sliding window to generate a single dataframe using pandas

I want to apply a sliding window of size 3 on the below dataframe and return a new dataframe containing the windowed data.我想在下面的 dataframe 上应用大小为 3 的滑动 window 并返回包含窗口数据的新 dataframe。

datetime,val
2008-11-01 00:00:00,14
2008-11-01 00:00:01,11
2008-11-01 00:00:02,22
2008-11-01 00:00:09,56
2008-11-01 00:00:10,32
2008-11-01 00:00:12,11
2008-11-01 00:00:13,95
2008-11-01 00:00:15,77
2008-11-01 00:00:16,49
2008-11-01 00:00:17,66

My desired output is as below:想要的 output如下:

           datetime  val
2008-11-01 00:00:00   14
2008-11-01 00:00:01   11
2008-11-01 00:00:02   22
2008-11-01 00:00:01   11
2008-11-01 00:00:02   22
2008-11-01 00:00:09   56
2008-11-01 00:00:02   22
2008-11-01 00:00:09   56
2008-11-01 00:00:10   32
2008-11-01 00:00:09   56
2008-11-01 00:00:10   32
2008-11-01 00:00:12   11
2008-11-01 00:00:10   32
2008-11-01 00:00:12   11
2008-11-01 00:00:13   95
2008-11-01 00:00:12   11
2008-11-01 00:00:13   95
2008-11-01 00:00:15   77
2008-11-01 00:00:13   95
2008-11-01 00:00:15   77
2008-11-01 00:00:16   49
2008-11-01 00:00:15   77
2008-11-01 00:00:16   49
2008-11-01 00:00:17   66

I have tried the below which generates the desired windows (as below) but the dataframe is not in the desired format as it returns the columns for each window. How can I convert the current output to the desired single dataframe (ie, one with only single row of columns at the start).我已经尝试了下面生成所需的 windows(如下所示)但 dataframe 不是所需的格式,因为它返回每个 window 的列。如何将当前的 output 转换为所需的单个 dataframe(即,只有一个开头的单行列)。

import pandas as pd 
import numpy as np 

def df_sliding_windows(data, window=0):
    for i in range(0, len(df) - window+1):
        yield df.iloc[i : i + window]

if __name__ == '__main__':
    df = pd.read_csv('sample.csv')
    df['datetime'] = pd.to_datetime(df['datetime'])

    df_slide_windows = df_sliding_windows(df, 3)
    for j in df_slide_windows:
        print(j)

             datetime  val
0 2008-11-01 00:00:00   14
1 2008-11-01 00:00:01   11
2 2008-11-01 00:00:02   22
             datetime  val
1 2008-11-01 00:00:01   11
2 2008-11-01 00:00:02   22
3 2008-11-01 00:00:09   56
             datetime  val
2 2008-11-01 00:00:02   22
3 2008-11-01 00:00:09   56
4 2008-11-01 00:00:10   32
             datetime  val
3 2008-11-01 00:00:09   56
4 2008-11-01 00:00:10   32
5 2008-11-01 00:00:12   11
             datetime  val
4 2008-11-01 00:00:10   32
5 2008-11-01 00:00:12   11
6 2008-11-01 00:00:13   95
             datetime  val
5 2008-11-01 00:00:12   11
6 2008-11-01 00:00:13   95
7 2008-11-01 00:00:15   77
             datetime  val
6 2008-11-01 00:00:13   95
7 2008-11-01 00:00:15   77
8 2008-11-01 00:00:16   49
             datetime  val
7 2008-11-01 00:00:15   77
8 2008-11-01 00:00:16   49
9 2008-11-01 00:00:17   66
w = 3
inds = [r + i for r in range(len(df) - w + 1) for i in range(w)]
df.iloc[inds]

Change your code更改代码

if __name__ == '__main__':
    df = pd.read_csv('sample.csv')
    df['datetime'] = pd.to_datetime(df['datetime'])

    df = pd.concat(df_sliding_windows(df, 3))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM