简体   繁体   中英

How to train the Shared Layers in PyTorch

I have the follow code

import torch
import torch.nn as nn
from torchviz import make_dot, make_dot_from_trace

class Net(nn.Module):
    def __init__(self, input, output):
        super(Net, self).__init__()
        self.fc = nn.Linear(input, output)

    def forward(self, x):
        x = self.fc(x)
        x = self.fc(x)
        return x

model = Net(12, 12)
print(model)

x = torch.rand(1, 12)
y = model(x)
make_dot(y, params = dict(model.named_parameters()))

Here I reuse the self.fc twice in the forward .

The computational graph is look

在此处输入图像描述

I am confused about the computational graph and, I am curious how to train this model in back propagation? It seem for me the gradient will live in a loop forever. Thanks a lot.

There are no issues with your graph. You can train it the same way as any other feed-forward model.

  1. Regarding looping: Since it is a directed acyclic graph, the are no actual loops (check out the arrow directions).
  2. Regarding backprop: Let's consider fc.bias parameter. Since you are reusing the same layer two times, the bias has two outgoing arrows (used in two places of your net). During backpropagation stage the direction is reversed: bias will get gradients from two places, and these gradients will add up.
  3. Regarding the graph: An FC layer can be represented as this: Addmm(bias, x, T(weight) , where T is transposing and Addmm is matrix multiplication plus adding a vector. So, you can see how data ( weight , bias ) is passed into functions ( Addmm , T )

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM