[英]Shape of pytorch model.parameter is inconsistent with how it's defined in the model
I'm attempting to extract the weights and biases from a simple network built in PyTorch. 我正在尝试从PyTorch内置的简单网络中提取权重和偏差。 My entire network is composed of nn.Linear layers.
我的整个网络由nn.Linear层组成。 When I create a layer by calling
nn.Linear(in_dim, out_dim)
, I expect the parameters that I get from calling model.parameters()
for that model to be of shape (in_dim, out_dim)
for the weight and (out_dim)
for the bias. 当我通过调用
nn.Linear(in_dim, out_dim)
创建图层时,我期望从该模型的调用model.parameters()
获得的参数的形状为权重(in_dim, out_dim)
的形状为(out_dim)
偏见。 However, the weights that come out of model.parameters()
are instead of shape (out_dim, in_dim)
. 但是,来自
model.parameters()
的权重不是形状(out_dim, in_dim)
。
The intention of my code is to be able to use matrix multiplication to perform a forward pass using only numpy, not any PyTorch. 我的代码的目的是能够使用矩阵乘法仅使用numpy而不使用任何PyTorch来执行前向传递。 Because of the shape inconsistency, matrix multiplications throw an error.
由于形状不一致,矩阵乘法会引发错误。 How can I fix this?
我怎样才能解决这个问题?
Here is my exact code: 这是我的确切代码:
class RNN(nn.Module):
def __init__(self, dim_input, dim_recurrent, dim_output):
super(RNN, self).__init__()
self.dim_input = dim_input
self.dim_recurrent = dim_recurrent
self.dim_output = dim_output
self.dense1 = nn.Linear(self.dim_input, self.dim_recurrent)
self.dense2 = nn.Linear(self.dim_recurrent, self.dim_recurrent, bias = False)
self.dense3 = nn.Linear(self.dim_input, self.dim_recurrent)
self.dense4 = nn.Linear(self.dim_recurrent, self.dim_recurrent, bias = False)
self.dense5 = nn.Linear(self.dim_recurrent, self.dim_output)
#There is a defined forward pass
model = RNN(12, 100, 6)
for i in model.parameters():
print(i.shape())
The output is: 输出为:
torch.Size([100, 12])
torch.Size([100])
torch.Size([100, 100])
torch.Size([100, 12])
torch.Size([100])
torch.Size([100, 100])
torch.Size([6, 100])
torch.Size([6])
The output should, if I'm correct, be: 如果我是正确的,输出应为:
torch.Size([12, 100])
torch.Size([100])
torch.Size([100, 100])
torch.Size([12, 100])
torch.Size([100])
torch.Size([100, 100])
torch.Size([100, 6])
torch.Size([6])
What is my issue? 我怎么了
What you see there is not the (out_dim, in_dim), it is just the shape of the weight matrix. 您看到的没有(out_dim,in_dim),它只是权重矩阵的形状。 When you call
print(model)
you can see that input and output features are correct: 调用
print(model)
您可以看到输入和输出功能正确:
RNN(
(dense1): Linear(in_features=12, out_features=100, bias=True)
(dense2): Linear(in_features=100, out_features=100, bias=False)
(dense3): Linear(in_features=12, out_features=100, bias=True)
(dense4): Linear(in_features=100, out_features=100, bias=False)
(dense5): Linear(in_features=100, out_features=6, bias=True)
)
You can check the source code to see that the weights are actually transposed before calling matmul
. 您可以在调用
matmul
之前检查源代码以查看权重实际上是否已转置 。
nn.Linear
is define here: nn.Linear
在这里定义:
https://pytorch.org/docs/stable/_modules/torch/nn/modules/linear.html#Linear https://pytorch.org/docs/stable/_modules/torch/nn/modules/linear.html#Linear
You can check the forward
, it looks like this: 您可以检查
forward
,它看起来像这样:
def forward(self, input):
return F.linear(input, self.weight, self.bias)
F.linear
is define here: F.linear
在这里定义:
https://pytorch.org/docs/stable/_modules/torch/nn/functional.html https://pytorch.org/docs/stable/_modules/torch/nn/functional.html
The respective line for multiplying the weights is: 乘以权重的相应行是:
output = input.matmul(weight.t())
As mentioned above you can see that the weights are transposed before applying matmul
and therefore the shape of the weights is different than you expected. 如上文所述,您可以看到,在应用
matmul
之前, 权重已发生转置 ,因此,权重的形状与您预期的不同。
So if you want to do the matrix multiplication manually, you do: 因此,如果要手动进行矩阵乘法,则可以执行以下操作:
# dummy input of length 5
input = torch.rand(5, 12)
# apply layer dense1 (without bias, for bias just add + model.dense1.bias)
output_first_layer = input.matmul(model.dense1.weight.t())
print(output_first_layer.shape)
Just as you would expect from your dense1
it returns: 就像您希望从
dense1
返回的那样:
torch.Size([5, 100])
I hope this explains your observations with the shape :) 我希望这可以用形状来解释您的观察:)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.