简体   繁体   English

损失值不降低

[英]The loss value does not decrease

I am implementing a simple feedforward neural network with Pytorch and the loss function does not seem to decrease.我正在使用 Pytorch 实现一个简单的前馈神经网络,并且损失 function 似乎没有减少。 Because of some other tests I have done, the problem seems to be in the computations I do to compute pred , since if I slightly change the network so that it spits out a 2-dimensional vector for each entry and save it as pred, everything works perfectly.由于我已经完成了一些其他测试,问题似乎出在我为计算pred所做的计算中,因为如果我稍微改变网络以便它为每个条目吐出一个二维向量并将其保存为 pred,一切完美运行。

Do you see the problem in defining pred here?你看到这里定义 pred 的问题了吗? Thanks谢谢

import torch
import numpy as np
from torch import nn

dt = 0.1

class Neural_Network(nn.Module):
    def __init__(self, ):
        super(Neural_Network, self).__init__()
    
        self.l1 = nn.Linear(2,300)
        self.nl = nn.Tanh()
        self.l2 = nn.Linear(300,1)
    
    
    def forward(self, X):
        z = self.l1(X)
        z = self.nl(z)
        o = self.l2(z)
        return o



N = 1000
X = torch.rand(N,2,requires_grad=True)
y = torch.rand(N,1)
NN = Neural_Network()

criterion = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.Adam(NN.parameters(), lr=1e-5)

epochs = 200

for i in range(epochs):  # trains the NN 1,000 times

    HH = torch.mean(NN(X))
    gradH =  torch.autograd.grad(HH, X)[0]
    XH= torch.cat((gradH[:,1].unsqueeze(0),-gradH[:,0].unsqueeze(0)),dim=0).t()
    pred = X + dt*XH

    #Optimize and improve the weights
    loss = criterion(pred, y)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()


    print (" Loss: ", loss.detach().numpy())  # mean sum squared loss

PS With these X and y the loss is not expected to go to zero, I have added them here like them just for simplicity. PS 有了这些 X 和 y,预计 go 的损失不会为零,为了简单起见,我在这里像它们一样添加了它们。 I will apply this architecture to data points which are expected to satisfy this model.我将把这个架构应用到预期满足这个 model 的数据点上。 However I am just interested in seeing the loss decreasing.但是,我只是对看到损失减少感兴趣。


My aim is to approximate with a neural network the Hamiltonian of a vector field where only some trajectory is known.我的目标是用神经网络近似一个向量场的哈密顿量,其中只有一些轨迹是已知的。 For example only the updates x(t)\rightarrow x(t+\Delta t) for some choice of points.例如,仅针对某些点的选择更新x(t)\rightarrow x(t+\Delta t) So the vector X contains the points x(t) , while y contains the $x(t+\Delta t)$.所以向量X包含点x(t) ,而y包含 $x(t+\Delta t)$。 My network above approximates in a simple way the Hamiltonian function H(x) and in order to optimize it I need to find the trajectories associated to this Hamiltonian.我上面的网络以简单的方式近似于哈密顿量 function H(x) ,为了优化它,我需要找到与这个哈密顿量相关的轨迹。

In particular XH aims to be the Hamiltonian vector field associated to the approximated Hamiltonian.特别是XH旨在成为与近似哈密顿量相关的哈密顿矢量场。 The time update pred = X + dt*XH is simply one step of forward Euler.时间更新pred = X + dt*XH只是向前欧拉的一步。

However, my main issue here can be abstracted in: how can I involve the gradient of a network with respect to its inputs in the loss function?但是,我在这里的主要问题可以抽象为:如何在损失 function 中涉及网络相对于其输入的梯度?

Probably because the gradient flow graph for NN is destroyed with the gradH step.可能是因为NN的梯度流图被gradH步骤破坏了。 (check HH.grad_fn vs gradH.grad_fn ) (检查HH.grad_fngradH.grad_fn

So your pred tensor (and subsequent loss) does not contain the necessary gradient flow through the NN network.因此,您的pred张量(以及随后的损失)不包含通过NN网络的必要梯度流。

The loss contains gradient flow for the input X , but not for the NN.parameters() . loss包含输入X的梯度流,但不包含NN.parameters() Because the optimizer only take a step() over those NN.parameters() , the network NN is not being updated, and since X is neither being updated, loss does not change.因为优化器只对那些NN.parameters()采取step() () ,所以网络NN没有被更新,并且由于X没有被更新,损失不会改变。

You can check how the loss is sending it's gradients backward by checking loss.grad_fn after loss.backward() and here's a neat function (found on Stackoverflow) to check it:您可以通过在loss.backward()之后检查loss.grad_fn来检查损失如何向后发送它的梯度,这是一个整洁的 function (在 Stackoverflow 上找到)来检查它:

def getBack(var_grad_fn):
    print(var_grad_fn)
    for n in var_grad_fn.next_functions:
        if n[0]:
            try:
                tensor = getattr(n[0], 'variable')
                print(n[0])
                print('Tensor with grad found:', tensor)
                print(' - gradient:', tensor.grad)
                print()
            except AttributeError as e:
                getBack(n[0])

with getBack(loss.grad_fn) after loss.backward() to check it for yourself (maybe reduce size of batch N before though)getBack(loss.grad_fn) loss.backward()检查它(虽然之前可能会减少批次 N 的大小)

Edit: It works by changing gradH = torch.autograd.grad(HH, X, create_graph=True)[0]编辑:它通过改变gradH = torch.autograd.grad(HH, X, create_graph=True)[0]来工作

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM