了解线性层中的 Pytorch 权重和偏差

Question

下面是将权重和偏差组合成单层的代码，我无法理解下面的行，为什么我们必须将权重转置矩阵与 bais 相乘。 我应该只是没有权重的偏见，因为我们正在乘以权重以获得最终输出3

combine_layer.bias.data = layer1.bias @ layer2.weight.t() + layer2.bias

 # Create a single layer to replace the two linear layers
 combined_layer = nn.Linear(input_size, output_size)

 combined_layer.weight.data = layer2.weight @ layer1.weight
 combined_layer.bias.data = layer1.bias @ layer2.weight.t() + layer2.bias //This should be just bias

 outputs3 = inputs @ combined_layer.weight.t() + combined_layer.bias

谁能帮助我理解这一点？

Answer 1

您只需要扩展两个Linear层的原始方程，即

# out = layer2(layer1(x))
# given (x @ A + B) @ C + D
out = (x @ layer1.weight.t() + layer1.bias) @ layer2.weight.t() + layer2.bias

您可以扩展(x @ A + B) @ C + D = (x @ A @ C) + B @ C + D

out = x @ layer1.weight.t() @ layer2.weight.t() + layer1.bias @ layer2.weight.t() + layer2.bias
out = x @ (layer1.weight.t() @ layer2.weight.t()) + (layer1.bias @ layer2.weight.t() + layer2.bias)
# the above equation is x @ (A @ C) + B @ C + D

# now you can assume
combined_layer.weight = layer2.weight @ layer1.weight
combined_layer.bias = layer1.bias @ layer2.weight.t() + layer2.bias 
# final output
out = x @ combined_layer.weight.t() + combined_layer.bias

另外，请注意这里也使用矩阵乘法转置规则，即

转置（A@B）=转置（B）@转置（A）

这就是为什么 combine_layer.weight.t combined_layer.weight.t()乘以 x，因为我们没有在layer2.weight @ layer1.weight中进行转置。

了解线性层中的 Pytorch 权重和偏差

问题描述

1 个解决方案

解决方案1
1 2022-01-14 16:51:08

了解线性层中的 Pytorch 权重和偏差

问题描述

1 个解决方案

解决方案1 1 2022-01-14 16:51:08

解决方案1
1 2022-01-14 16:51:08