简体   繁体   English

pytorch如何计算简单线性回归模型的梯度?

[英]How does pytorch compute the gradients for a simple linear regression model?

I am using pytorch and trying to understand how a simple linear regression model works. 我正在使用pytorch并试图理解简单的线性回归模型是如何工作的。

I'm using a simple LinearRegressionModel class: 我正在使用一个简单的LinearRegressionModel类:

class LinearRegressionModel(nn.Module):
    def __init__(self, input_dim, output_dim):
        super(LinearRegressionModel, self).__init__()
        self.linear = nn.Linear(input_dim, output_dim)  

    def forward(self, x):
        out = self.linear(x)
        return out

model = LinearRegressionModel(1, 1)

Next I instantiate a loss criterion and an optimizer 接下来,我实例化一个损失标准和一个优化器

criterion = nn.MSELoss()

optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

Finally to train the model I use the following code: 最后训练模型我使用以下代码:

for epoch in range(epochs):
    if torch.cuda.is_available():
        inputs = Variable(torch.from_numpy(x_train).cuda())

    if torch.cuda.is_available():
        labels = Variable(torch.from_numpy(y_train).cuda())

    # Clear gradients w.r.t. parameters
    optimizer.zero_grad() 

    # Forward to get output
    outputs = model(inputs)

    # Calculate Loss
    loss = criterion(outputs, labels)

    # Getting gradients w.r.t. parameters
    loss.backward()

    # Updating parameters
    optimizer.step()

My question is how does the optimizer get the loss gradient, computed by loss.backward() , to update the parameters using the step() method? 我的问题是优化器如何获得由loss.backward()计算的损耗梯度,以使用step()方法更新参数? How are the model, the loss criterion and the optimizer tied together? 模型,损失标准和优化器如何捆绑在一起?

PyTorch has this concept of tensors and variables. PyTorch有这个张量和变量的概念。 When you use nn.Linear the function creates 2 variables namely W and b.In pytorch a variable is a wrapper that encapsulates a tensor , its gradient and information about its create function. 当你使用nn.Linear时,函数会创建2个变量,即W和b。在pytorch中,变量是一个封装器,它封装了一个张量,它的渐变和有关其创建函数的信息。 you can directly access the gradients by 你可以直接访问渐变

w.grad

When you try it before calling the loss.backward() you get None. 当你在调用loss.backward()之前尝试它时,你得到None。 Once you call the loss.backward() it will contain now gradients. 一旦你调用了loss.backward(),它现在将包含渐变。 Now you can update these gradient manually with the below simple steps. 现在,您可以使用以下简单步骤手动更新这些渐变。

w.data -= learning_rate * w.grad.data

When you have a complex network ,the above simple step could grow complex. 当您拥有复杂的网络时,上述简单步骤可能会变得复杂。 So optimisers like SGD , Adam takes care of this. 所以像SGD这样的优化者,Adam会照顾到这一点。 When you create the object for these optimisers we pass in the parameters of our model. 为这些优化器创建对象时,我们会传入模型的参数。 nn.Module contains this parameters() function which will return all the learnable parameters to the optimiser. nn.Module包含这个parameters()函数,它将所有可学习的参数返回给优化器。 Which can be done using the below step. 可以使用以下步骤完成。

optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
loss.backward()

calculates the gradients and store them in the parameters. 计算渐变并将它们存储在参数中。 And you pass in the paremeters that are needed to be tuned here: 你传递了需要在这里调整的参数:

optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM