简体   繁体   中英

Many-to-Many LSTM PyTorch

I want to build an LSTM model for the FashionMNIST dataset in PyTorch. I will later on need to extend this to a different dataset that contains videos.

It should get a sequence of images (of the FashionMNIST) as the input (let's say 20 images) and the output should tell me how many sneakers are in the sequence (class 6) and where in the sequence they are located.

I was wondering whether this is even possible with a simple LSTM or a simple CNN or whether I need an CNN_LSTM? I tried to implement an CNN_LSTM in PyTorch. Below you can find my current model (which throws an error right now). The last line throws the following error: "input must have 3 dimensions, got 4" (I also added the first part of the error message as a picture). Could someone please provide some help by any chance? Is the way I am doing this correct? I couldn't fix the error and I am not sure whether the rest of my code is correct. I am quite new to LSTMs. Also, how can I transform the FashionMNIST dataset such that it always comes in a sequence of 20 images?

Many thanks in advance!

class CNN(nn.Module):
  def __init__(self, K):
    super(CNN, self).__init__()
    self.layer1 = nn.Sequential(
        nn.Conv2d(in_channels=1, out_channels=32, kernel_size=3, padding=1),
        nn.BatchNorm2d(32),
        nn.ReLU(),
        nn.MaxPool2d(kernel_size=2, stride=2))


    self.layer2 = nn.Sequential(
        nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3),
        nn.BatchNorm2d(64),
        nn.ReLU(),
        nn.MaxPool2d(2))

    
    # three fully connected layer
    self.fc1 = nn.Linear(in_features=64*6*6, out_features=600)
    self.drop = nn.Dropout2d(0.25)
    self.fc2 = nn.Linear(in_features=600, out_features=120)
    self.fc3 = nn.Linear(in_features=120, out_features=10)
  
  def forward(self, x):
      out = self.layer1(x)
      out = self.layer2(out)
      out = out.view(out.size(0), -1)
      out = self.fc1(out)
      out = self.drop(out)
      out = self.fc2(out)
      out = self.fc3(out) 
      
      return out

class Combine(nn.Module):
  def __init__(self, K):
    super(Combine, self).__init__()
    self.cnn = CNN(K)
    self.D = 10  # n_inputs
    self.M = 128 # n_hidden
    self.K = 2 # n_outputs
    self.L = 10 # n_rnnlayers

    self.rnn = nn.LSTM(
        input_size=self.D,
        hidden_size=self.M,
        num_layers=self.L,
        batch_first=True)
    self.fc =nn.Linear(self.M, self.K)
  
  def forward(self, X):
    # initial hidden states
    h0 = torch.zeros(self.L, X.size(0), self.M).to(device)
    c0 = torch.zeros(self.L, X.size(0), self.M).to(device)

    # get RNN unit output
    out, _ = self.rnn(X, (h0, c0)) 
    out = self.fc(out)
    
    return out


model = Combine(K)

# use GPU in colab if available
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(device)
model.to(device)

# Loss and optimizer
learning_rate = 0.001
criterion = nn.CrossEntropyLoss() # because mutli-class classification (includes softmax activation function for multi-class already)
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

# Training and testing the model
def batch_gd(model, criterion, optimizer, train_loader, test_loader, epochs):
  train_losses = np.zeros(epochs)
  test_losses = np.zeros(epochs)

  # iterate over epochs
  for it in range(epochs):
    model.train()
    t0 = datetime.now()
    train_loss = []
    for inputs, targets in train_loader:
      # move data to GPU
      #inputs = inputs.reshape(-1,28,28)
      inputs, targets = inputs.to(device), targets.to(device)

      # zero the parameter gradients (empty gradients) for backward pass
      # Initializing a gradient as 0 so there is no mixing of gradient among the batches
      optimizer.zero_grad()

      # Forward pass
      outputs = model(inputs)
      loss = criterion(outputs, targets)
        
      # Backward and optimize
      loss.backward() # propagating the error backward 
      optimizer.step() # optimizing the parameters

      train_loss.append(loss.item())

    # Get train loss and test loss
    train_loss = np.mean(train_loss) # a little misleading
    
    # evaluate model
    model.eval()
    test_loss = []
    for inputs, targets in test_loader: # test samples and targets
      # move data to GPU
      inputs, targets = inputs.to(device), targets.to(device)
      outputs = model(inputs)
      loss = criterion(outputs, targets)
      test_loss.append(loss.item())
    test_loss = np.mean(test_loss)

    # Save losses
    train_losses[it] = train_loss
    test_losses[it] = test_loss
    
    dt = datetime.now() - t0
    print(f'Epoch {it+1}/{epochs}, Train Loss: {train_loss:.4f}, \
      Test Loss: {test_loss:.4f}, Duration: {dt}')
  
  return train_losses, test_losses

train_losses, test_losses = batch_gd(
    model, criterion, optimizer, train_loader, test_loader, epochs=15)

Part of the Error Message

Here's a useful thought experiment - you've framed the problem as a sequential decision process, but does the order in which the sneakers are presented matter?

Imagine that this is your sequence, where x s are non sneakers and S s are sneakers, and you're about to classify the image at position 7:

xxSxxx?

does the fact that there is a sneaker at position 3 in this sequence affect your decision about the current sneaker?

It shouldn't - which means that you actually shouldn't frame this as a sequential problem, and shouldn't use RNNs which are designed to model sequential dependencies. Rather, you can think of this as simply training a single model to make predictions for each input, independently of other inputs. You can the run this model on a "sequence" of sneakers, and record which ones are sneakers, but of course the order won't matter:)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM