简体   繁体   English

多对多 LSTM PyTorch

[英]Many-to-Many LSTM PyTorch

I want to build an LSTM model for the FashionMNIST dataset in PyTorch.我想为 PyTorch 中的 FashionMNIST 数据集构建一个 LSTM model。 I will later on need to extend this to a different dataset that contains videos.稍后我需要将其扩展到包含视频的不同数据集。

It should get a sequence of images (of the FashionMNIST) as the input (let's say 20 images) and the output should tell me how many sneakers are in the sequence (class 6) and where in the sequence they are located.它应该得到一个图像序列(FashionMNIST)作为输入(比如说 20 张图像),output 应该告诉我序列中有多少运动鞋(第 6 类)以及它们在序列中的位置。

I was wondering whether this is even possible with a simple LSTM or a simple CNN or whether I need an CNN_LSTM?我想知道这是否可以使用简单的 LSTM 或简单的 CNN,或者我是否需要 CNN_LSTM? I tried to implement an CNN_LSTM in PyTorch.我试图在 PyTorch 中实现一个 CNN_LSTM。 Below you can find my current model (which throws an error right now).您可以在下面找到我当前的 model(现在会引发错误)。 The last line throws the following error: "input must have 3 dimensions, got 4" (I also added the first part of the error message as a picture).最后一行抛出以下错误:“输入必须有 3 个维度,得到 4 个”(我还添加了错误消息的第一部分作为图片)。 Could someone please provide some help by any chance?有人可以提供一些帮助吗? Is the way I am doing this correct?我这样做的方式正确吗? I couldn't fix the error and I am not sure whether the rest of my code is correct.我无法修复错误,我不确定我的代码的 rest 是否正确。 I am quite new to LSTMs.我对 LSTM 很陌生。 Also, how can I transform the FashionMNIST dataset such that it always comes in a sequence of 20 images?另外,如何转换 FashionMNIST 数据集,使其始终以 20 张图像的序列出现?

Many thanks in advance!提前谢谢了!

class CNN(nn.Module):
  def __init__(self, K):
    super(CNN, self).__init__()
    self.layer1 = nn.Sequential(
        nn.Conv2d(in_channels=1, out_channels=32, kernel_size=3, padding=1),
        nn.BatchNorm2d(32),
        nn.ReLU(),
        nn.MaxPool2d(kernel_size=2, stride=2))


    self.layer2 = nn.Sequential(
        nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3),
        nn.BatchNorm2d(64),
        nn.ReLU(),
        nn.MaxPool2d(2))

    
    # three fully connected layer
    self.fc1 = nn.Linear(in_features=64*6*6, out_features=600)
    self.drop = nn.Dropout2d(0.25)
    self.fc2 = nn.Linear(in_features=600, out_features=120)
    self.fc3 = nn.Linear(in_features=120, out_features=10)
  
  def forward(self, x):
      out = self.layer1(x)
      out = self.layer2(out)
      out = out.view(out.size(0), -1)
      out = self.fc1(out)
      out = self.drop(out)
      out = self.fc2(out)
      out = self.fc3(out) 
      
      return out

class Combine(nn.Module):
  def __init__(self, K):
    super(Combine, self).__init__()
    self.cnn = CNN(K)
    self.D = 10  # n_inputs
    self.M = 128 # n_hidden
    self.K = 2 # n_outputs
    self.L = 10 # n_rnnlayers

    self.rnn = nn.LSTM(
        input_size=self.D,
        hidden_size=self.M,
        num_layers=self.L,
        batch_first=True)
    self.fc =nn.Linear(self.M, self.K)
  
  def forward(self, X):
    # initial hidden states
    h0 = torch.zeros(self.L, X.size(0), self.M).to(device)
    c0 = torch.zeros(self.L, X.size(0), self.M).to(device)

    # get RNN unit output
    out, _ = self.rnn(X, (h0, c0)) 
    out = self.fc(out)
    
    return out


model = Combine(K)

# use GPU in colab if available
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(device)
model.to(device)

# Loss and optimizer
learning_rate = 0.001
criterion = nn.CrossEntropyLoss() # because mutli-class classification (includes softmax activation function for multi-class already)
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

# Training and testing the model
def batch_gd(model, criterion, optimizer, train_loader, test_loader, epochs):
  train_losses = np.zeros(epochs)
  test_losses = np.zeros(epochs)

  # iterate over epochs
  for it in range(epochs):
    model.train()
    t0 = datetime.now()
    train_loss = []
    for inputs, targets in train_loader:
      # move data to GPU
      #inputs = inputs.reshape(-1,28,28)
      inputs, targets = inputs.to(device), targets.to(device)

      # zero the parameter gradients (empty gradients) for backward pass
      # Initializing a gradient as 0 so there is no mixing of gradient among the batches
      optimizer.zero_grad()

      # Forward pass
      outputs = model(inputs)
      loss = criterion(outputs, targets)
        
      # Backward and optimize
      loss.backward() # propagating the error backward 
      optimizer.step() # optimizing the parameters

      train_loss.append(loss.item())

    # Get train loss and test loss
    train_loss = np.mean(train_loss) # a little misleading
    
    # evaluate model
    model.eval()
    test_loss = []
    for inputs, targets in test_loader: # test samples and targets
      # move data to GPU
      inputs, targets = inputs.to(device), targets.to(device)
      outputs = model(inputs)
      loss = criterion(outputs, targets)
      test_loss.append(loss.item())
    test_loss = np.mean(test_loss)

    # Save losses
    train_losses[it] = train_loss
    test_losses[it] = test_loss
    
    dt = datetime.now() - t0
    print(f'Epoch {it+1}/{epochs}, Train Loss: {train_loss:.4f}, \
      Test Loss: {test_loss:.4f}, Duration: {dt}')
  
  return train_losses, test_losses

train_losses, test_losses = batch_gd(
    model, criterion, optimizer, train_loader, test_loader, epochs=15)

Part of the Error Message部分错误信息

Here's a useful thought experiment - you've framed the problem as a sequential decision process, but does the order in which the sneakers are presented matter?这是一个有用的思想实验——你已经将问题描述为一个连续的决策过程,但是运动鞋的呈现顺序重要吗?

Imagine that this is your sequence, where x s are non sneakers and S s are sneakers, and you're about to classify the image at position 7:想象这是您的序列,其中x s 是非运动鞋, S s 是运动鞋,并且您将在 position 7 处对图像进行分类:

xxSxxx?

does the fact that there is a sneaker at position 3 in this sequence affect your decision about the current sneaker?在这个序列中 position 3 有运动鞋的事实会影响您对当前运动鞋的决定吗?

It shouldn't - which means that you actually shouldn't frame this as a sequential problem, and shouldn't use RNNs which are designed to model sequential dependencies.它不应该——这意味着你实际上不应该将此视为一个顺序问题,也不应该使用为 model 顺序依赖设计的 RNN。 Rather, you can think of this as simply training a single model to make predictions for each input, independently of other inputs.相反,您可以将其视为简单地训练单个model 以独立于其他输入对每个输入进行预测。 You can the run this model on a "sequence" of sneakers, and record which ones are sneakers, but of course the order won't matter:)您可以在运动鞋的“序列”上运行此 model,并记录哪些是运动鞋,但当然顺序无关紧要:)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM