[英]Hidden size vs input size in RNN
Premise 1:前提1:
Regarding neurons in a RNN layer - it is my understanding that at "each time step, every neuron receives both the input vector x (t) and the output vector from the previous time step y (t –1)" [1] :关于 RNN 层中的神经元 - 我的理解是“在每个时间步长,每个神经元都接收输入向量 x (t) 和来自前一个时间步长 y (t –1) 的输出向量” [1] :
Premise 2:前提2:
It is also my understanding that in Pytorch's GRU layer, input_size and hidden_size mean the following:也是我的理解,在Pytorch的GRU层中, input_size和hidden_size的含义如下:
- input_size – The number of expected features in the input x
input_size – 输入 x 中预期特征的数量
- hidden_size – The number of features in the hidden state h
hidden_size – 隐藏状态的特征数 h
So naturally, hidden_size should represent the number of neurons in a GRU layer.所以很自然地, hidden_size应该代表 GRU 层中的神经元数量。
My question:我的问题:
Given the following GRU layer:给定以下 GRU 层:
# assume that hidden_size = 3
class Encoder(nn.Module):
def __init__(self, src_dictionary_size, hidden_size):
super(Encoder, self).__init__()
self.embedding = nn.Embedding(src_dictionary_size, hidden_size)
self.gru = nn.GRU(input_size = hidden_size, hidden_size = hidden_size)
Assuming a hidden_size of 3, my understanding is that the GRU layer above would have 3 neurons, each which accepts an input vector of size 3 simultaneously for every timestep.假设 hidden_size 为 3,我的理解是上面的 GRU 层将有 3 个神经元,每个神经元在每个时间步同时接受一个大小为 3 的输入向量。
My question is : why do the arguments to hidden_size and input_size have to be equal?我的问题是:为什么hidden_size和input_size的参数必须相等? Ie why can't each of the 3 neurons accept say, an input vector of size 5?
即为什么 3 个神经元中的每一个都不能接受大小为 5 的输入向量?
Case in point: both of the following produce size mismatch:举个例子:以下两种情况都会导致尺寸不匹配:
self.gru = nn.GRU(input_size = hidden_size, hidden_size = hidden_size-1)
self.gru = nn.GRU(input_size = hidden_size, hidden_size = hidden_size+1)
[1] Géron, Aurélien. [1] Géron,Aurélien。 Hands-On Machine Learning with Scikit-Learn and TensorFlow (p. 388).
使用 Scikit-Learn 和 TensorFlow 进行机器学习实践(第 388 页)。 O'Reilly Media.
奥莱利媒体。 Kindle Edition.
Kindle版。
[3] https://pytorch.org/docs/stable/nn.html#torch.nn.GRU [3] https://pytorch.org/docs/stable/nn.html#torch.nn.GRU
Adding full code for reproducibility:添加完整代码以实现可重复性:
import torch
import torch.nn as nn
class Encoder(nn.Module):
def __init__(self, src_dictionary_size, hidden_size):
super(Encoder, self).__init__()
self.hidden_size = hidden_size
self.embedding = nn.Embedding(src_dictionary_size, hidden_size)
self.gru = nn.GRU(input_size = hidden_size, hidden_size = hidden_size-1)
def forward(self, pad_seqs, seq_lengths, hidden):
"""
Args:
pad_seqs of shape (max_seq_length, batch_size, 1): Padded source sequences.
seq_lengths: List of sequence lengths.
hidden of shape (1, batch_size, hidden_size): Initial states of the GRU.
Returns:
outputs of shape (max_seq_length, batch_size, hidden_size): Padded outputs of GRU at every step.
hidden of shape (1, batch_size, hidden_size): Updated states of the GRU.
"""
embedded_sqs = self.embedding(pad_seqs).squeeze(2)
packed_sqs = pack_padded_sequence(embedded_sqs, seq_lengths)
packed_output, h_n = self.gru(packed_sqs, hidden)
output, input_sizes = pad_packed_sequence(packed_output)
return output, h_n
def init_hidden(self, batch_size=1):
return torch.zeros(1, batch_size, self.hidden_size)
def test_Encoder_shapes():
hidden_size = 5
encoder = Encoder(src_dictionary_size=5, hidden_size=hidden_size)
# maximum word count
max_seq_length = 4
# num sentences
batch_size = 2
hidden = encoder.init_hidden(batch_size=batch_size)
# these are padded sequences (sentences of words). There are 2 sentences (i.e. 2 batches) with a maximum of 4 words.
pad_seqs = torch.tensor([
[1, 2],
[2, 3],
[3, 0],
[4, 0]
]).view(max_seq_length, batch_size, 1)
outputs, new_hidden = encoder.forward(pad_seqs=pad_seqs, seq_lengths=[4, 2], hidden=hidden)
assert outputs.shape == torch.Size([4, batch_size, hidden_size]), f"Bad outputs.shape: {outputs.shape}"
assert new_hidden.shape == torch.Size([1, batch_size, hidden_size]), f"Bad new_hidden.shape: {new_hidden.shape}"
print('Success')
test_Encoder_shapes()
I just resolved this and the mistake was self-inflicted.我刚刚解决了这个问题,错误是自己造成的。
Conclusion : input_size and hidden_size can differ in size and there is no inherent problem with this.结论: input_size和hidden_size 的大小可以不同,这没有固有的问题。 The premises in the question are correctly stated.
问题中的前提是正确陈述的。
The problem with the (full) code above was that the initial hidden state of the GRU did not have the correct dimensions.上面(完整)代码的问题是 GRU 的初始隐藏状态没有正确的维度。 The initial hidden state must have the same dimensions as subsequent hidden states.
初始隐藏状态必须与后续隐藏状态具有相同的维度。 In my case, the initial hidden state had the shape of (1,2,5) instead of (1,2,4).
就我而言,初始隐藏状态的形状为 (1,2,5) 而不是 (1,2,4)。 In the former, 5 represents the dimensionality of the embedding vector.
前者中,5表示嵌入向量的维数。 4 represents the hidden_size (num neurons) in the GRU.
4 表示 GRU 中的 hidden_size(神经元数量)。 The correct code is below:
正确的代码如下:
import torch
import torch.nn as nn
class Encoder(nn.Module):
def __init__(self, src_dictionary_size, input_size, hidden_size):
super(Encoder, self).__init__()
self.hidden_size = hidden_size
self.embedding = nn.Embedding(src_dictionary_size, input_size)
self.gru = nn.GRU(input_size = input_size, hidden_size = hidden_size)
def forward(self, pad_seqs, seq_lengths, hidden):
"""
Args:
pad_seqs of shape (max_seq_length, batch_size, 1): Padded source sequences.
seq_lengths: List of sequence lengths.
hidden of shape (1, batch_size, hidden_size): Initial states of the GRU.
Returns:
outputs of shape (max_seq_length, batch_size, hidden_size): Padded outputs of GRU at every step.
hidden of shape (1, batch_size, hidden_size): Updated states of the GRU.
"""
embedded_sqs = self.embedding(pad_seqs).squeeze(2)
packed_sqs = pack_padded_sequence(embedded_sqs, seq_lengths)
packed_output, h_n = self.gru(packed_sqs, hidden)
output, input_sizes = pad_packed_sequence(packed_output)
return output, h_n
def init_hidden(self, batch_size=1):
return torch.zeros(1, batch_size, self.hidden_size)
def test_Encoder_shapes():
hidden_size = 4
embedding_size = 5
encoder = Encoder(src_dictionary_size=5, input_size = embedding_size, hidden_size = hidden_size)
print(encoder)
max_seq_length = 4
batch_size = 2
hidden = encoder.init_hidden(batch_size=batch_size)
pad_seqs = torch.tensor([
[1, 2],
[2, 3],
[3, 0],
[4, 0]
]).view(max_seq_length, batch_size, 1)
outputs, new_hidden = encoder.forward(pad_seqs=pad_seqs, seq_lengths=[4, 2], hidden=hidden)
assert outputs.shape == torch.Size([4, batch_size, hidden_size]), f"Bad outputs.shape: {outputs.shape}"
assert new_hidden.shape == torch.Size([1, batch_size, hidden_size]), f"Bad new_hidden.shape: {new_hidden.shape}"
print('Success')
test_Encoder_shapes()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.