用於時間序列預測的 LSTM 無法學習（PyTorch）

Question

我目前正在構建一個 LSTM 網絡，以使用 PyTorch 預測時間序列數據。 我試圖分享所有我認為會有所幫助的代碼片段，但如果我能提供任何進一步的信息，請隨時告訴我。 我在帖子末尾添加了一些關於潛在問題的評論。

從按日期索引的單變量時間序列數據中，我創建了 3 個日期特征並將數據拆分為訓練集和驗證集，如下所示。

# X_train
             weekday    monthday    hour
timestamp           
2015-01-08 17:00:00 3   8   17
2015-01-12 19:30:00 0   12  19
2014-12-01 15:30:00 0   1   15
2014-07-26 09:00:00 5   26  9
2014-10-17 20:30:00 4   17  20
... ... ... ...
2014-08-29 06:30:00 4   29  6
2014-10-13 14:30:00 0   13  14
2015-01-03 02:00:00 5   3   2
2014-12-06 16:00:00 5   6   16
2015-01-06 20:30:00 1   6   20
8256 rows × 3 columns

# y_train
                    value
timestamp   
2015-01-08 17:00:00 17871
2015-01-12 19:30:00 20321
2014-12-01 15:30:00 16870
2014-07-26 09:00:00 11209
2014-10-17 20:30:00 26144
... ...
2014-08-29 06:30:00 9008
2014-10-13 14:30:00 17698
2015-01-03 02:00:00 12850
2014-12-06 16:00:00 18277
2015-01-06 20:30:00 19640
8256 rows × 1 columns

# X_val
             weekday    monthday    hour
timestamp           
2015-01-08 07:00:00 3   8   7
2014-10-13 22:00:00 0   13  22
2014-12-07 01:30:00 6   7   1
2014-10-14 17:30:00 1   14  17
2014-10-25 09:30:00 5   25  9
... ... ... ...
2014-09-26 12:30:00 4   26  12
2014-10-08 16:00:00 2   8   16
2014-12-03 01:30:00 2   3   1
2014-09-11 08:00:00 3   11  8
2015-01-15 10:00:00 3   15  10
2064 rows × 3 columns

# y_val
                    value
timestamp   
2014-09-13 13:00:00 21345
2014-10-28 20:30:00 23210
2015-01-21 17:00:00 17001
2014-07-20 10:30:00 13936
2015-01-29 02:00:00 3604
... ...
2014-11-17 11:00:00 15247
2015-01-14 00:00:00 10584
2014-09-02 13:00:00 17698
2014-08-31 13:00:00 16652
2014-08-30 12:30:00 15775
2064 rows × 1 columns

然后，我使用 sklearn 庫中的 MinMaxScaler 轉換了數據集中的值。

scaler = MinMaxScaler()
X_train_arr = scaler.fit_transform(X_train)
X_val_arr = scaler.transform(X_val)
y_train_arr = scaler.fit_transform(y_train)
y_val_arr = scaler.transform(y_val)

After converting these NumPy arrays into PyTorch Tensors, I created iterable datasets using TensorDataset and DataLoader classes provided by PyTorch.

from torch.utils.data import TensorDataset, DataLoader
from torch.autograd import Variable

train_features = torch.Tensor(X_train_arr)
train_targets = torch.Tensor(y_train_arr)

val_features = torch.Tensor(X_val_arr)
val_targets = torch.Tensor(y_val_arr)

train = TensorDataset(train_features, train_targets)
train_loader = DataLoader(train, batch_size=64, shuffle=False)

val = TensorDataset(val_features, val_targets)
val_loader = DataLoader(train, batch_size=64, shuffle=False)

然后，我定義了我的 LSTM Model 和 train_step 函數如下：

class LSTMModel(nn.Module):
    def __init__(self, input_dim, hidden_dim, layer_dim, output_dim):
        super(LSTMModel, self).__init__()
        # Hidden dimensions
        self.hidden_dim = hidden_dim
        
        # Number of hidden layers
        self.layer_dim = layer_dim
        
        # Building your LSTM
        # batch_first=True causes input/output tensors to be of shape
        # (batch_dim, seq_dim, feature_dim)
        self.lstm = nn.LSTM(input_dim, hidden_dim, layer_dim, batch_first=True)
        
        # Readout layer
        self.fc = nn.Linear(hidden_dim, output_dim)
    
    def forward(self, x):
        # Initialize hidden state with zeros
        h0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).requires_grad_()
        
        # Initialize cell state
        c0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).requires_grad_()
        
        # We need to detach as we are doing truncated backpropagation through time (BPTT)
        # If we don't, we'll backprop all the way to the start even after going through another batch
        out, (hn, cn) = self.lstm(x, (h0.detach(), c0.detach()))
        
        # Index hidden state of last time step
        out = self.fc(out[:, -1, :]) 
        return out

def make_train_step(model, loss_fn, optimizer):
    # Builds function that performs a step in the train loop
    def train_step(x, y):
        # Sets model to TRAIN mode
        model.train()
        # Makes predictions
        yhat = model(x)
        # Computes loss
        loss = loss_fn(y, yhat)
        # Computes gradients
        loss.backward()
        # Updates parameters and zeroes gradients
        optimizer.step()
        optimizer.zero_grad()
        # Returns the loss
        return loss.item()
    
    # Returns the function that will be called inside the train loop
    return train_step

最后，我開始使用 AdamOptimizer 小批量訓練我的 LSTM model 20 個 epoch，這已經足夠長到可以看到 model 沒有學習。

import torch.optim as optim

input_dim = n_features
hidden_dim = 64
layer_dim = 3
output_dim = 1

model = LSTMModel(input_dim, hidden_dim, layer_dim, output_dim)

criterion = nn.MSELoss(reduction='mean')
optimizer = optim.Adam(model.parameters(), lr=1e-2)

train_losses = []
val_losses = []
train_step = make_train_step(model, criterion, optimizer)
n_epochs = 20
device = 'cuda' if torch.cuda.is_available() else 'cpu'

for epoch in range(n_epochs):
    batch_losses = []
    for x_batch, y_batch in train_loader:
        x_batch = x_batch.unsqueeze(dim=0).to(device)
        y_batch = y_batch.to(device)
        loss = train_step(x_batch, y_batch)
        batch_losses.append(loss)
    training_loss = np.mean(batch_losses)
    train_losses.append(training_loss)    
    with torch.no_grad():
        batch_val_losses = []
        for x_val, y_val in val_loader:
            x_val = x_val.unsqueeze(dim=0).to(device)
            y_val = y_val.to(device)        
            model.eval()
            yhat = model(x_val)
            val_loss = criterion(y_val, yhat).item()
            batch_val_losses.append(val_loss)
        validation_loss = np.mean(batch_val_losses)
        val_losses.append(validation_loss)
    
    print(f"[{epoch+1}] Training loss: {training_loss:.4f}\t Validation loss: {validation_loss:.4f}")

這是 output：

C:\Users\VS32XI\Anaconda3\lib\site-packages\torch\nn\modules\loss.py:446: UserWarning: Using a target size (torch.Size([1, 1])) that is different to the input size (torch.Size([64, 1])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.
  return F.mse_loss(input, target, reduction=self.reduction)
[1] Training loss: 0.0505    Validation loss: 0.0315
[2] Training loss: 0.0317    Validation loss: 0.0315
[3] Training loss: 0.0317    Validation loss: 0.0315
[4] Training loss: 0.0317    Validation loss: 0.0315
[5] Training loss: 0.0317    Validation loss: 0.0315
[6] Training loss: 0.0317    Validation loss: 0.0315
[7] Training loss: 0.0317    Validation loss: 0.0315
[8] Training loss: 0.0317    Validation loss: 0.0315
[9] Training loss: 0.0317    Validation loss: 0.0315
[10] Training loss: 0.0317   Validation loss: 0.0315
[11] Training loss: 0.0317   Validation loss: 0.0315
[12] Training loss: 0.0317   Validation loss: 0.0315
[13] Training loss: 0.0317   Validation loss: 0.0315
[14] Training loss: 0.0317   Validation loss: 0.0315
[15] Training loss: 0.0317   Validation loss: 0.0315
[16] Training loss: 0.0317   Validation loss: 0.0315
[17] Training loss: 0.0317   Validation loss: 0.0315
[18] Training loss: 0.0317   Validation loss: 0.0315
[19] Training loss: 0.0317   Validation loss: 0.0315
[20] Training loss: 0.0317   Validation loss: 0.0315

注 1：查看給出的警告，我不確定這是否是 model 學習不好的真正原因。 畢竟，我試圖預測時間序列數據中的未來值； 因此，1 將是一個合理的 output 維度。

注 2：為了小批量訓練 model，我使用了 class DataLoader。在訓練和驗證 DataLoaders 中迭代 X 和 Y 批次時，x_batches 的尺寸為 2，而 model 預期為 3。因此，我使用 PyTorch 的 unsqueeze function 來匹配預期尺寸，如x_batch.unsqueeze(dim=0) . 我不確定我是否應該這樣做，這也可能是問題所在。

Answer 1

一旦我使用Tensor View為訓練和驗證集中的特征重塑小批量，問題就解決了。 附帶說明一下， view()通過避免顯式的數據復制，實現了快速且節省內存的整形、切片和元素操作。

事實證明，在早期的實現中， torch.unsqueeze()沒有將批次重塑為具有尺寸（批次大小、時間步長、特征數量）的張量。 相反，function unsqueeze unsqueeze(dim=0)返回一個新張量，其中 singleton 維度插入到 Oth 索引處。

因此，特征集的小批量的形狀如下x_batch = x_batch.view([batch_size, -1, n_features]).to(device)

然后，新的訓練循環變為：

for epoch in range(n_epochs):
    batch_losses = []
    for x_batch, y_batch in train_loader:
        x_batch = x_batch.view([batch_size, -1, n_features]).to(device) # <---
        y_batch = y_batch.to(device)
        loss = train_step(x_batch, y_batch)
        batch_losses.append(loss)
    training_loss = np.mean(batch_losses)
    train_losses.append(training_loss)    
    with torch.no_grad():
        batch_val_losses = []
        for x_val, y_val in val_loader:
            x_val = x_val.view([batch_size, -1, n_features]).to(device) # <---
            y_val = y_val.to(device)        
            model.eval()
            yhat = model(x_val)
            val_loss = criterion(y_val, yhat).item()
            batch_val_losses.append(val_loss)
        validation_loss = np.mean(batch_val_losses)
        val_losses.append(validation_loss)
    
    print(f"[{epoch+1}] Training loss: {training_loss:.4f}\t Validation loss: {validation_loss:.4f}")

這是 output：

[1] Training loss: 0.0235    Validation loss: 0.0173
[2] Training loss: 0.0149    Validation loss: 0.0086
[3] Training loss: 0.0083    Validation loss: 0.0074
[4] Training loss: 0.0079    Validation loss: 0.0069
[5] Training loss: 0.0076    Validation loss: 0.0069

                          ...

[96] Training loss: 0.0025   Validation loss: 0.0028
[97] Training loss: 0.0024   Validation loss: 0.0027
[98] Training loss: 0.0027   Validation loss: 0.0033
[99] Training loss: 0.0027   Validation loss: 0.0030
[100] Training loss: 0.0023  Validation loss: 0.0028

用於時間序列預測的 LSTM 無法學習（PyTorch）

問題描述

1 個解決方案

解決方案1
0 已采納 2021-01-07 17:00:49

用於時間序列預測的 LSTM 無法學習（PyTorch）

問題描述

1 個解決方案

解決方案1 0 已采納 2021-01-07 17:00:49

解決方案1
0 已采納 2021-01-07 17:00:49