简体   繁体   English

pytorch中的多元输入LSTM

[英]Multivariate input LSTM in pytorch

I would like to implement LSTM for multivariate input in Pytorch .我想在 Pytorch 中为多元输入实现LSTM。

Following this article https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/ which uses keras, the input data are in shape of (number of samples, number of timesteps, number of parallel features)在本文https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/之后使用 keras,输入数据的形状为(样本数、时间步数、数平行特征)

in_seq1 = array([10, 20, 30, 40, 50, 60, 70, 80, 90])
in_seq2 = array([15, 25, 35, 45, 55, 65, 75, 85, 95])
out_seq = array([in_seq1[i]+in_seq2[i] for i in range(len(in_seq1))])
. . . 
Input     Output
[[10 15]
 [20 25]
 [30 35]] 65
[[20 25]
 [30 35]
 [40 45]] 85
[[30 35]
 [40 45]
 [50 55]] 105
[[40 45]
 [50 55]
 [60 65]] 125
[[50 55]
 [60 65]
 [70 75]] 145
[[60 65]
 [70 75]
 [80 85]] 165
[[70 75]
 [80 85]
 [90 95]] 185

n_timesteps = 3
n_features = 2

In keras it seems to be easy:在 keras 这似乎很容易:

model.add(LSTM(50, activation='relu', input_shape=(n_timesteps, n_features)))

Can it be done in other way, than creating n_features of LSTMs as first layer and feed each separately (imagine as multiple streams of sequences) and then flatten their output to linear layer?除了将 LSTM 的n_features创建为第一层并分别馈送每个(想象为多个序列流)然后将其 output 展平到线性层之外,是否可以通过其他方式完成?

I'm not 100% sure but by nature of LSTM the input cannot be flattened and passed as 1D array, because each sequence "plays by different rules" which the LSTM is supposed to learn.我不是 100% 确定,但根据 LSTM 的性质,输入不能被展平并作为一维数组传递,因为每个序列“按照 LSTM 应该学习的不同规则播放”。

So how does such implementation with keras equal to PyTorch input of shape (seq_len, batch, input_size) (source https://pytorch.org/docs/stable/nn.html#lstm )那么 keras 的这种实现如何等于 PyTorch input of shape (seq_len, batch, input_size) (来源https://pytorch.org/docs/stable/nn.html#lstm


Edit:编辑:

Can it be done in other way, than creating n_features of LSTMs as first layer and feed each separately (imagine as multiple streams of sequences) and then flatten their output to linear layer?除了将 LSTM 的n_features创建为第一层并分别馈送每个(想象为多个序列流)然后将其 output 展平到线性层之外,是否可以通过其他方式完成?

According to PyTorch docs the input_size parameter actually means number of features (if it means number of parallel sequences)根据 PyTorch文档input_size参数实际上表示特征数(如果它表示并行序列数)

I hope that problematic parts are commented to make sense: 我希望对有问题的部分进行评论以使其有意义:

Data preparation 资料准备

import random
import numpy as np
import torch

# multivariate data preparation
from numpy import array
from numpy import hstack

# split a multivariate sequence into samples
def split_sequences(sequences, n_steps):
    X, y = list(), list()
    for i in range(len(sequences)):
        # find the end of this pattern
        end_ix = i + n_steps
        # check if we are beyond the dataset
        if end_ix > len(sequences):
            break
        # gather input and output parts of the pattern
        seq_x, seq_y = sequences[i:end_ix, :-1], sequences[end_ix-1, -1]
        X.append(seq_x)
        y.append(seq_y)
    return array(X), array(y)

# define input sequence
in_seq1 = array([x for x in range(0,100,10)])
in_seq2 = array([x for x in range(5,105,10)])
out_seq = array([in_seq1[i]+in_seq2[i] for i in range(len(in_seq1))])
# convert to [rows, columns] structure
in_seq1 = in_seq1.reshape((len(in_seq1), 1))
in_seq2 = in_seq2.reshape((len(in_seq2), 1))
out_seq = out_seq.reshape((len(out_seq), 1))
# horizontally stack columns
dataset = hstack((in_seq1, in_seq2, out_seq))

Multivariate LSTM Network 多元LSTM网络

class MV_LSTM(torch.nn.Module):
    def __init__(self,n_features,seq_length):
        super(MV_LSTM, self).__init__()
        self.n_features = n_features
        self.seq_len = seq_length
        self.n_hidden = 20 # number of hidden states
        self.n_layers = 1 # number of LSTM layers (stacked)

        self.l_lstm = torch.nn.LSTM(input_size = n_features, 
                                 hidden_size = self.n_hidden,
                                 num_layers = self.n_layers, 
                                 batch_first = True)
        # according to pytorch docs LSTM output is 
        # (batch_size,seq_len, num_directions * hidden_size)
        # when considering batch_first = True
        self.l_linear = torch.nn.Linear(self.n_hidden*self.seq_len, 1)


    def init_hidden(self, batch_size):
        # even with batch_first = True this remains same as docs
        hidden_state = torch.zeros(self.n_layers,batch_size,self.n_hidden)
        cell_state = torch.zeros(self.n_layers,batch_size,self.n_hidden)
        self.hidden = (hidden_state, cell_state)


    def forward(self, x):        
        batch_size, seq_len, _ = x.size()

        lstm_out, self.hidden = self.l_lstm(x,self.hidden)
        # lstm_out(with batch_first = True) is 
        # (batch_size,seq_len,num_directions * hidden_size)
        # for following linear layer we want to keep batch_size dimension and merge rest       
        # .contiguous() -> solves tensor compatibility error
        x = lstm_out.contiguous().view(batch_size,-1)
        return self.l_linear(x)

Initialization 初始化

n_features = 2 # this is number of parallel inputs
n_timesteps = 3 # this is number of timesteps

# convert dataset into input/output
X, y = split_sequences(dataset, n_timesteps)
print(X.shape, y.shape)

# create NN
mv_net = MV_LSTM(n_features,n_timesteps)
criterion = torch.nn.MSELoss() # reduction='sum' created huge loss value
optimizer = torch.optim.Adam(mv_net.parameters(), lr=1e-1)

train_episodes = 500
batch_size = 16

Training 训练

mv_net.train()
for t in range(train_episodes):
    for b in range(0,len(X),batch_size):
        inpt = X[b:b+batch_size,:,:]
        target = y[b:b+batch_size]    

        x_batch = torch.tensor(inpt,dtype=torch.float32)    
        y_batch = torch.tensor(target,dtype=torch.float32)

        mv_net.init_hidden(x_batch.size(0))
    #    lstm_out, _ = mv_net.l_lstm(x_batch,nnet.hidden)    
    #    lstm_out.contiguous().view(x_batch.size(0),-1)
        output = mv_net(x_batch) 
        loss = criterion(output.view(-1), y_batch)  

        loss.backward()
        optimizer.step()        
        optimizer.zero_grad() 
    print('step : ' , t , 'loss : ' , loss.item())

Results 结果

step :  499 loss :  0.0010267728939652443 # probably overfitted due to 500 training episodes

input in any rnn cell in pytorch is 3d input, formatted as (seq_len, batch, input_size) or (batch, seq_len, input_size), if you prefer second (like also me lol) init lstm layer )or other rnn layer) with arg pytorch中任何rnn单元格中的输入都是3d输入,格式为(seq_len,batch,input_size)或(batch,seq_len,input_size),如果您更喜欢第二个(也像我一样)init lstm层)或其他rnn层,则带有arg

bach_first = True

https://discuss.pytorch.org/t/could-someone-explain-batch-first-true-in-lstm/15402 https://discuss.pytorch.org/t/could-someone-explain-batch-first-true-in-lstm/15402

also you dont have any reccurent relation in the setup. 另外,您在设置中没有任何递归关系。 If you want to create many to one counter , create input if size (-1, n,1) where -1 is size what you want, n is number of digits, one digit per tick like input [[10][20][30]] , output - 60, input [[30,][70]] output 100 etc, input must have different lengths from 1 to some max, in order to learn rnn relation 如果要创建多对一计数器,请在大小(-1,n,1)处创建输入,其中-1是所需大小,n是位数,每滴答数一位,例如输入[[10] [20] [30]的输出-60,输入[[30,] [70]]的输出100等等,为了学习rnn关系,输入必须具有从1到某个最大值的不同长度。

import random

import numpy as np

import torch


def rnd_io():    
    return  np.random.randint(100, size=(random.randint(1,10), 1))


class CountRNN(torch.nn.Module):

def __init__(self):
    super(CountRNN, self).__init__()

    self.rnn = torch.nn.RNN(1, 20,num_layers=1, batch_first=True)
    self.fc = torch.nn.Linear(20, 1)


def forward(self, x):        
    full_out, last_out = self.rnn(x)
    return self.fc(last_out)


nnet = CountRNN()

criterion = torch.nn.MSELoss(reduction='sum')

optimizer = torch.optim.Adam(nnet.parameters(), lr=0.0005)

batch_size = 100

batches = 10000 * 1000

printout = max(batches //(20* 1000),1)

for t in range(batches):

optimizer.zero_grad()

x_batch = torch.unsqueeze(torch.from_numpy(rnd_io()).float(),0)

y_batch = torch.unsqueeze(torch.sum(x_batch),0)

output = nnet.forward(x_batch) 

loss = criterion(output, y_batch)

if t % printout == 0:
    print('step : ' , t , 'loss : ' , loss.item())  
    torch.save(nnet.state_dict(), './rnn_summ.pth')  

loss.backward()

optimizer.step()

I would just like to slightly update the Training section..... With a basic early stop mechanism and the code to save the model.我只想稍微更新一下培训部分..... 具有基本的提前停止机制和保存 model 的代码。

#Early stopping
the_last_loss = -100
patience = 4
trigger_times = 0

mv_net.train()
for t in range(train_episodes):
    for b in range(0,len(X),batch_size):
        inpt = X[b:b+batch_size,:,:]
        target = y[b:b+batch_size]    
        
        x_batch = torch.tensor(inpt,dtype=torch.float32)    
        y_batch = torch.tensor(target,dtype=torch.float32)
    
        mv_net.init_hidden(x_batch.size(0))
    #    lstm_out, _ = mv_net.l_lstm(x_batch,nnet.hidden)    
    #    lstm_out.contiguous().view(x_batch.size(0),-1)
        output = mv_net(x_batch) 
        loss = criterion(output.view(-1), y_batch)  
        
        loss.backward()
        optimizer.step()        
        optimizer.zero_grad() 
    the_current_loss= loss.item()
    print('step : ' , t , 'loss : ' , the_current_loss)    
    the_current_loss= loss.item()
    if the_current_loss > the_last_loss:
        trigger_times += 1        
        if trigger_times >= patience:
            print('Early stopping!\nStart to test process.')
            break
    else:
        #print('trigger times: 0')
        trigger_times = 0
    
    the_last_loss = the_current_loss
   
# Lets assume we are happy with this 
#Save the model
torch.save(mv_net.state_dict(),'pytorch_dev')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM