简体繁体 English

PyTorch中RNN（nn.LSTM，nn.GRU等）的输出h_n的结构如何？

[英]How is the output h_n of an RNN (nn.LSTM, nn.GRU, etc.) in PyTorch structured?

原文 2018-04-05 13:50:39 7 1 python/ neural-network/ deep-learning/ lstm/ pytorch

The docs say 医生说

h_n of shape (num_layers * num_directions, batch, hidden_size): tensor containing the hidden state for t = seq_len h_n的形状（num_layers * num_directions，batch，hidden_size）：包含t = seq_len的隐藏状态的张量

Now, the batch and hidden_size dimensions are pretty much self-explanatory. 现在， batch和hidden_size维度几乎是不言自明的。 The first dimension remains a mystery, though. 但是，第一维仍然是个谜。

I assume, that the hidden states of all "last cells" of all layers are included in this output. 我假设所有层的所有“最后一个单元”的隐藏状态都包含在此输出中。 But then what is the index of, for example, the hidden state of the "last cell" in the "uppermost layer"? 但是，例如“最上层”中“最后一个单元”的隐藏状态的索引是什么？ h_n[-1] ? h_n[-1] ？ h_n[0] ? h_n[0] ？

Is the output affected by the batch_first option? 输出受batch_first选项影响吗？

1 个解决方案

The implementation of LSTM and GRU in pytorch automatically includes the possibility of stacked layers of LSTMs and GRUs. pytorch中LSTM和GRU的实现自动包括LSTM和GRU的堆叠层的可能性。

You give this with the keyword argument nn.LSTM(num_layers=num_layers) . 您可以使用关键字nn.LSTM(num_layers=num_layers)作为参数。 num_layers is the number of stacked LSTMs (or GRUs) that you have. num_layers是您拥有的堆叠LSTM（或GRU）的数量。 The default value is 1, which gives you the basic LSTM. 默认值为1，这为您提供了基本的LSTM。

num_directions is either 1 or 2. It is 1 for normal LSTMs and GRUs, and it is 2 for bidirectional RNNs. num_directions是1或2。对于普通LSTM和GRU，它是1，对于双向RNN，它是2。

So in your case, you probably have a simple LSTM or GRU so the value of num_layers * num_directions would then be one. 因此，在您的情况下，您可能有一个简单的LSTM或GRU，因此num_layers * num_directions的值将为1。

h_n[0] is the hidden state of the bottom-most layer (the one which takes in the input), and h_n[-1] of the top-most layer (the one which outputs the output of the network). h_n[0]是最底层（接受输入的那一层）的隐藏状态，是h_n[-1]是最顶层（输出网络的输出那一层）的隐藏状态。

batch_first puts the batch dimension before the time dimension (the default being the time dimension before the batch dimension), because the hidden state doesn't have a time dimension, batch_first has no effect on the hidden state shape. batch_first将批次维放在时间维之前（默认为时间维在批次维之前），因为隐藏状态没有时间维， batch_first对隐藏状态形状没有影响。