简体   繁体   English

如何在迭代时间序列时产生滑动块 windows?

[英]How to produce chunks of sliding windows while iterating over a time series?

EDITED:编辑:

I have a time series let's say ts = [[0 0][1 1][2 2][3 3][4 4][5 5][6 6][7 7][8 8]] and I want to slit in the following two sequences:我有一个时间序列,假设ts = [[0 0][1 1][2 2][3 3][4 4][5 5][6 6][7 7][8 8]]我想要按以下两个顺序切开:

 X = [[[[0][1]][[1][2]][[2][3]]] [[[1][2]][[2][3]][[3][4]]] [[[2][3]][[3][4]][[4][5]]] [[[3][4]][[4][5]][[5][6]]] [[[4][5]][[5][6]][[6][7]]] [[[5][6]][[6][7]][[7][8]]]] 
y = [[3][4][5][6][7][8]]

X is the sequence of chunks of three two-steps-sliding windows while y is its features. X 是三个两步滑动 windows 的块序列,而 y 是它的特征。 My strategy was to employ the following methods, first:我的策略是首先采用以下方法:

def split_sequences(sequences, n_steps):
        X, y = list(), list()
        for i in range(len(sequences)):
        # find the end of this pattern
            end_ix = i + n_steps
            prev_end_ix = end_ix - 1
        # check if we are beyond the dataset
            if end_ix > len(sequences):
                break
        # gather input and output parts of the pattern
            seq_x, seq_y = sequences[i:end_ix, :-1], sequences[prev_end_ix:end_ix, -1]
            X.append(seq_x)
            y.append(seq_y)
        return np.array(X), np.array(y)

Which retorns:哪个回报:

X =[[[0][1]] [[1][2]] [[2][3]] [[3][4]] [[4][5]] [[5][6]] [[6][7]] [[7][8]]] 
y = [[1][2][3][4][5][6][7][8]]

And then I apply the following two methods to obtain the desired output:然后我应用以下两种方法来获得所需的 output:

def separar_uni_X(sequencia, n_passos):
    X = list()
    for i in range(len(sequencia)):
        # find the end of this pattern
        end_ix = i + n_passos
        # check if we are beyond the sequence
        if end_ix > len(sequencia):
            break
        # gather input and output parts of the pattern
        seq_x = sequencia[i:end_ix, :]
        X.append(seq_x)
    return np.array(X)

def separar_uni_y(sequencia, n_passos):
    y = list()
    for i in range(len(sequencia)):
        # find the end of this pattern
        end_ix = i + n_passos
        # check if we are beyond the sequence
        if end_ix > len(sequencia):
            break
        # gather input and output parts of the pattern
        seq_y = sequencia[i:end_ix, :]
        y.append(seq_y[-1])
    return np.array(y)

PROBLEM: The problem is that in order to obtain the desired output it has to store the data from the first method to the second two and, when the sequence is too long, it exceeds the memory capacity.问题:问题在于,为了获得所需的 output,它必须存储从第一种方法到第二种方法的数据,并且当序列太长时,它会超过 memory 的容量。 In order to handle this drawback breaking down the process in subprocesses I've used this method:为了解决这个在子流程中分解流程的缺点,我使用了这种方法:

def split_sequence_3D(sequences, n_steps, batch_size):
    X, y = list(), list()
    for i in range(len(sequences)):
    # find the end of this pattern
        end_ix = i + n_steps
        prev_end_ix = end_ix - 1
    # check if we are beyond the dataset
        if end_ix > len(sequences):
            break
    # gather input and output parts of the pattern
        seq_x, seq_y = sequences[i:end_ix, :-1], sequences[prev_end_ix:end_ix, -1]
        sub_X, sub_y = [], []
        for j in range(batch_size):
            sub_X.append(seq_x)
            sub_y.append(seq_y)
        X.append(sub_X)
        y.append(sub_y[-1])    
    return np.array(X), np.array(y)

Which gives me the wrong output, for obvious reasons:这给了我错误的 output,原因很明显:

X = [[[[0][1]][[0][1]][[0][1]]] [[[1][2]][[1][2]][[1][2]]] [[[2][3]][[2][3]][[2]   [3]]] [[[3][4]][[3][4]][[3][4]]] [[[4][5]][[4][5]][[4][5]]] [[[5][6]][[5][6]][[5 [6]]] [[[6][7]][[6][7]][[6][7]]] [[[7][8]][[7][8]][[7][8]]]] 
y = [[1][2][3][4][5][6][7][8]]

I have extensively looked for an alternative and haven't found it.我广泛地寻找替代品,但没有找到。

Well I've taken really pain to solve your problem, which was mine as well.好吧,我真的很痛苦地解决了你的问题,这也是我的问题。 But the solution, at last, turned out to be kind of simple.但最终,解决方案被证明是一种简单的方法。 My solution was to make sliding window iterator to slide as well.我的解决方案是让滑动 window 迭代器也滑动。

def input_3D(sequencia, lote, janela):
        if lote > len(sequencia):
            raise ValueError('Tamanho do lote maior que o conjunto dos dados')
        if janela > len(sequencia):
            raise ValueError('Tamanho da janela maior que o conjunto dos dados')    
        X_, y_ = [], []
        for j in range (len(sequencia)):
            if j+lote+janela > len(sequencia):
                break
            X, y = [], []
            for i in range (j,j+lote,1):
                end_ix = i+janela
                prev_end_ix = end_ix - 1
                seq_x, seq_y = sequencia[i:end_ix, :-1], sequencia[prev_end_ix:end_ix, -1]
                X.append(np.array(seq_x))
                y.append(np.array(seq_y[-1]))
            X_.append(np.array(X))
            y_.append(np.array(y[-1]))
        return np.array(X_), np.array(y_)

Suppose your input is:假设您的输入是:

arr_x = list(range(0,100))
arr_y = list(range(0,100))
arr = np.stack([arr_x,arr_y])
arr = arr.T

Then your output is gonna be:那么您的 output 将是:

[[[[ 0]
   [ 1]
   [ 2]
   ...
   [ 7]
   [ 8]
   [ 9]]

  [[ 1]
   [ 2]
   [ 3]
   ...
   [ 8]
   [ 9]
   [10]]

  [[ 2]
   [ 3]
   [ 4]
   ...
   [ 9]
   [10]
   [11]]

  [[ 3]
   [ 4]
   [ 5]
   ...
   [10]
   [11]
   [12]]]

...

 [[[86]
   [87]
   [88]
   ...
   [93]
   [94]
   [95]]

  [[87]
   [88]
   [89]
   ...
   [94]
   [95]
   [96]]

  [[88]
   [89]
   [90]
   ...
   [95]
   [96]
   [97]]

  [[89]
   [90]
   [91]
   ...
   [96]
   [97]
   [98]]]] [12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 
 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59
 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83
 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM