简体   繁体   English

pytorch model.parameter的形状与模型中定义的方式不一致

[英]Shape of pytorch model.parameter is inconsistent with how it's defined in the model

I'm attempting to extract the weights and biases from a simple network built in PyTorch. 我正在尝试从PyTorch内置的简单网络中提取权重和偏差。 My entire network is composed of nn.Linear layers. 我的整个网络由nn.Linear层组成。 When I create a layer by calling nn.Linear(in_dim, out_dim) , I expect the parameters that I get from calling model.parameters() for that model to be of shape (in_dim, out_dim) for the weight and (out_dim) for the bias. 当我通过调用nn.Linear(in_dim, out_dim)创建图层时,我期望从该模型的调用model.parameters()获得的参数的形状为权重(in_dim, out_dim)的形状为(out_dim)偏见。 However, the weights that come out of model.parameters() are instead of shape (out_dim, in_dim) . 但是,来自model.parameters()的权重不是形状(out_dim, in_dim)

The intention of my code is to be able to use matrix multiplication to perform a forward pass using only numpy, not any PyTorch. 我的代码的目的是能够使用矩阵乘法仅使用numpy而不使用任何PyTorch来执行前向传递。 Because of the shape inconsistency, matrix multiplications throw an error. 由于形状不一致,矩阵乘法会引发错误。 How can I fix this? 我怎样才能解决这个问题?

Here is my exact code: 这是我的确切代码:

class RNN(nn.Module):

    def __init__(self, dim_input, dim_recurrent, dim_output):

        super(RNN, self).__init__()

        self.dim_input = dim_input
        self.dim_recurrent = dim_recurrent
        self.dim_output = dim_output

        self.dense1 = nn.Linear(self.dim_input, self.dim_recurrent)
        self.dense2 = nn.Linear(self.dim_recurrent, self.dim_recurrent, bias = False)
        self.dense3 = nn.Linear(self.dim_input, self.dim_recurrent)
        self.dense4 = nn.Linear(self.dim_recurrent, self.dim_recurrent, bias = False)
        self.dense5 = nn.Linear(self.dim_recurrent, self.dim_output)

#There is a defined forward pass

model = RNN(12, 100, 6)

for i in model.parameters():
    print(i.shape())

The output is: 输出为:

torch.Size([100, 12])
torch.Size([100])
torch.Size([100, 100])
torch.Size([100, 12])
torch.Size([100])
torch.Size([100, 100])
torch.Size([6, 100])
torch.Size([6])

The output should, if I'm correct, be: 如果我是正确的,输出应为:

torch.Size([12, 100])
torch.Size([100])
torch.Size([100, 100])
torch.Size([12, 100])
torch.Size([100])
torch.Size([100, 100])
torch.Size([100, 6])
torch.Size([6])

What is my issue? 我怎么了

What you see there is not the (out_dim, in_dim), it is just the shape of the weight matrix. 您看到的没有(out_dim,in_dim),它只是权重矩阵的形状。 When you call print(model) you can see that input and output features are correct: 调用print(model)您可以看到输入和输出功能正确:

RNN(
  (dense1): Linear(in_features=12, out_features=100, bias=True)
  (dense2): Linear(in_features=100, out_features=100, bias=False)
  (dense3): Linear(in_features=12, out_features=100, bias=True)
  (dense4): Linear(in_features=100, out_features=100, bias=False)
  (dense5): Linear(in_features=100, out_features=6, bias=True)
)

You can check the source code to see that the weights are actually transposed before calling matmul . 您可以在调用matmul之前检查源代码以查看权重实际上是否已转置


nn.Linear is define here: nn.Linear在这里定义:
https://pytorch.org/docs/stable/_modules/torch/nn/modules/linear.html#Linear https://pytorch.org/docs/stable/_modules/torch/nn/modules/linear.html#Linear

You can check the forward , it looks like this: 您可以检查forward ,它看起来像这样:

def forward(self, input):
    return F.linear(input, self.weight, self.bias)


F.linear is define here: F.linear在这里定义:
https://pytorch.org/docs/stable/_modules/torch/nn/functional.html https://pytorch.org/docs/stable/_modules/torch/nn/functional.html

The respective line for multiplying the weights is: 乘以权重的相应行是:

output = input.matmul(weight.t())

As mentioned above you can see that the weights are transposed before applying matmul and therefore the shape of the weights is different than you expected. 如上文所述,您可以看到,在应用matmul之前, 权重已发生转置 ,因此,权重的形状与您预期的不同。

So if you want to do the matrix multiplication manually, you do: 因此,如果要手动进行矩阵乘法,则可以执行以下操作:

# dummy input of length 5
input = torch.rand(5, 12)
# apply layer dense1 (without bias, for bias just add + model.dense1.bias)
output_first_layer = input.matmul(model.dense1.weight.t())
print(output_first_layer.shape)

Just as you would expect from your dense1 it returns: 就像您希望从dense1返回的那样:

torch.Size([5, 100])

I hope this explains your observations with the shape :) 我希望这可以用形状来解释您的观察:)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM