简体   繁体   中英

How is the output h_n of an RNN (nn.LSTM, nn.GRU, etc.) in PyTorch structured?

The docs say

h_n of shape (num_layers * num_directions, batch, hidden_size): tensor containing the hidden state for t = seq_len

Now, the batch and hidden_size dimensions are pretty much self-explanatory. The first dimension remains a mystery, though.

I assume, that the hidden states of all "last cells" of all layers are included in this output. But then what is the index of, for example, the hidden state of the "last cell" in the "uppermost layer"? h_n[-1] ? h_n[0] ?

Is the output affected by the batch_first option?

The implementation of LSTM and GRU in pytorch automatically includes the possibility of stacked layers of LSTMs and GRUs.

You give this with the keyword argument nn.LSTM(num_layers=num_layers) . num_layers is the number of stacked LSTMs (or GRUs) that you have. The default value is 1, which gives you the basic LSTM.

num_directions is either 1 or 2. It is 1 for normal LSTMs and GRUs, and it is 2 for bidirectional RNNs.

So in your case, you probably have a simple LSTM or GRU so the value of num_layers * num_directions would then be one.

h_n[0] is the hidden state of the bottom-most layer (the one which takes in the input), and h_n[-1] of the top-most layer (the one which outputs the output of the network).

batch_first puts the batch dimension before the time dimension (the default being the time dimension before the batch dimension), because the hidden state doesn't have a time dimension, batch_first has no effect on the hidden state shape.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM