PyTorch LSTM dropout vs Keras LSTM dropout

Question

I'm trying to port my sequential Keras network to PyTorch. But I'm having trouble with the LSTM units:

LSTM(512,
     stateful = False,
     return_sequences = True,
     dropout = 0.5),
LSTM(512,
     stateful = False,
     return_sequences = True,
     dropout = 0.5),

How should I formulate this in PyTorch? Especially dropout appears to work very differently in PyTorch than it does in Keras.

Answer 1

The following should work for you.

lstm = nn.LSTM(
    input_size = ?, 
    hidden_size = 512, 
    num_layers = 1,
    batch_first = True, 
    dropout = 0.5
)

You need to set the input_size . Check out the documentation on LSTM .

Update

In a 1-layer LSTM, there is no point in assigning dropout since dropout is applied to the outputs of intermediate layers in a multi-layer LSTM module. So, PyTorch may complain about dropout if num_layers is set to 1. If we want to apply dropout at the final layer's output from the LSTM module, we can do something like below.

lstm = nn.Sequential(
    nn.LSTM(
        input_size = ?, 
        hidden_size = 512, 
        num_layers = 1,
        batch_first = True
    ),
    nn.Dropout(0.5)
)

According to the above definition, the output of the LSTM would pass through a Dropout layer.

PyTorch LSTM dropout vs Keras LSTM dropout

Question

1 answers

solution1
1 2020-06-09 09:58:06

PyTorch LSTM dropout vs Keras LSTM dropout

Question

1 answers

solution1 1 2020-06-09 09:58:06

solution1
1 2020-06-09 09:58:06