Why is very simple PyTorch LSTM model not learning?

Question

I am trying to do very simple learning so that I can better understand how PyTorch and LSTMs work. To that end, I am trying to learn a mapping from an input tensor to an output tensor (same shape) that is twice the value. So [1 2 3] as input should learn [2 4 6] as an output. To that end, I have a dataloader :

class AudioDataset(Dataset):
    def __init__(self, corrupted_path, train_set=False, test_set=False):
        torch.manual_seed(0)
        numpy.random.seed(0)

    def __len__(self):
        return len(self.file_paths)

    def __getitem__(self, index):
        random_tensor = torch.rand(1, 5) * 2
        random_tensor = random_tensor - 1

        return random_tensor, random_tensor * 2

My LSTM itself is pretty simple:

class MyLSTM(nn.Module):
    def __init__(self, input_size=4000):
        super(MyLSTM, self).__init__()

        self.lstm = nn.LSTM(input_size=input_size, hidden_size=input_size,
                            num_layers=2)

    def forward(self, x):
        y = self.lstm(x)
        return y

My training looks like:

    train_loader = torch.utils.data.DataLoader(
        train_set, batch_size=1, shuffle=True, **kwargs)

    model = MyLSTM(input_size=5)
    optimizer = optim.Adam(model.parameters(), lr=0.01, weight_decay=0.0001)
    loss_fn = torch.nn.MSELoss(reduction='sum')

    for epoch in range(300):
        for i, data in enumerate(train_loader):
            inputs = data[0]
            outputs = data[1]

            print('inputs', inputs, inputs.size())
            print('outputs', outputs, outputs.size())
            optimizer.zero_grad()

            pred = model(inputs)
            print('pred', pred[0], pred[0].size())

            loss = loss_fn(pred[0], outputs)

            model.zero_grad()

            loss.backward()
            optimizer.step()

After 300 epochs, my loss looks like tensor(1.4892, grad_fn=<MseLossBackward>) . Which doesn't appear to be very good.

Randomly looking at some of the inputs / outputs and predictions:

inputs tensor([[[0.5050, 0.4669, 0.8310,  ..., 0.0659, 0.5043, 0.8885]]]) torch.Size([1, 1, 4000])
outputs tensor([[[1.0100, 0.9338, 1.6620,  ..., 0.1319, 1.0085, 1.7770]]]) torch.Size([1, 1, 4000])
pred tensor([[[ 0.6930,  0.0231, -0.6874,  ..., -0.5225,  0.1096,  0.5796]]],
       grad_fn=<StackBackward>) torch.Size([1, 1, 4000])

We see that it hasn't learned very much at all. I can't understand what I'm doing wrong; if someone can guide me, that would be greatly appreciated.

Answer 1

LSTMs are made of neurons that generate an internal state based upon a feedback loop from previous training data. Each neuron has four internal gates that take multiple inputs and generate multiple outputs. It's one of the more complex neurons to work with and understand, and I'm not really skilled enough to give an in-depth answer.

What I see in your example code is a lack of understanding of how they work, and it seems like you're assuming they work like a linear layer. I say that, because your forward method doesn't handle the internal state and you're not reshaping the outputs.

You define the LSTM like this:

     self.lstm = nn.LSTM(input_size=input_size, hidden_size=input_size, num_layers=2)

The hidden_size relates to how memory and features work with the gates.

PyTorch documentation says the following:

hidden_size – The number of features in the hidden state h

It is referring to the size of the hidden state used to train the internal gates for long and short term memory. The gates are a function across the hidden features that store previous gate outputs. Each time a neuron is trained the hidden state is updated and is used again for the next training data.

So why is this so important?

You are throwing away the hidden state data during training, and I don't know what happens if you don't define the hidden state. I assume the LSTM works as if there is never any history.

The forward function should look something like this:

    def forward(self, x, hidden):
        lstm_output, hidden = self.lstm(x, hidden)
        return lstm_output, hidden

During training you have to keep track of the hidden state yourself.

for i in range(epochs):
   hidden = (torch.zeros(num_layers, batch_size, num_hidden),
             torch.zeros(num_layers, batch_size, num_hidden))

   for x, y in generate_batches(...):
        # missing code....
        lstm_output, hidden = model.forward(x, hidden)

Take note of the shape for the hidden state. It's different from what you usually do with linear layers.

There are some steps above missing that relate to resetting the hidden state, but I can't remember how that part works.

LSTMs on their own only describe features much like convolution layers. It's unlikely that the outputs from a LSTMs is what you're interested in using.

Most models that use LSTMs or convolutions will have a bottom section of fully connected layers (for example: nn.Linear() ). These layers will train on the features to predict the outputs you're interested in.

The problem here is that the outputs from LSTMs are in the wrong shape, and you have to reshape the tensors so that a linear layer can use them.

Here is an example LSTM forward function that I have used:

    def forward(self, x, hidden):
        lstm_output, hidden = self.lstm(x, hidden)
        drop_output = self.dropout(lstm_output)
        drop_output = drop_output.contiguous().view(-1, self.num_hidden)
        final_out = self.fc_linear(drop_output)
        return final_out, hidden

LSTMs are definitely an advanced topic in machine learning, and PyTorch isn't an easy library to learn to begin with. I would recommend reading up on LSTMs using the TensorFlow documentation and online blogs to get a better grasp of how they work.

Why is very simple PyTorch LSTM model not learning?

Question

1 answers

solution1
4 ACCPTED 2020-02-13 14:56:49

Why is very simple PyTorch LSTM model not learning?

Question

1 answers

solution1 4 ACCPTED 2020-02-13 14:56:49

solution1
4 ACCPTED 2020-02-13 14:56:49