简体   繁体   中英

Pytorch model gradients no updating with some custom code

I've put together some computation which I'm trying to compute a loss on the result, and compute the gradients of all the parameters of the model wrt that loss. The problem is that nestled in the computation is a tunable model that I want to be able to tune (eventually). Right now I am just trying to confirm that I can see the gradients of the model parameters when they are updated with backward() , which I cannot, This is the problem. Below I post code, the output, and the desired output.

class ExpModelTunable(torch.nn.Module):
    def __init__(self):
        super(ExpModelTunable, self).__init__()
        self.alpha = torch.nn.Parameter( torch.tensor(1.0, requires_grad=True) )
        self.beta = torch.nn.Parameter( torch.tensor(1.0, requires_grad=True) )
    
    def forward(self, t):
        return self.alpha * torch.exp(  - self.beta * t ) 

def func_f(t, t_list):
  mu = torch.tensor(0.13191110355, requires_grad=True)
  running_sum = torch.sum( torch.tensor( [ f(t-ti) for ti in t_list ], requires_grad=True ) )
  return mu + running_sum

def pytorch_objective_tunable(u, t_list):
  global U
  steps = torch.linspace(t_list[-1].item(),u.item(),100, requires_grad=True)
  func_values = torch.tensor( [ func_f(steps[i], t_list) for i in range(len(steps)) ], requires_grad=True )
  return torch.log(U) + torch.trapz(func_values, steps)

def newton_method(function, func, initial, t_list, iteration=200, convergence=0.0001):
    for i in range(iteration): 
        previous_data = initial.clone()
        value = function(initial, t_list)
        initial.data -= (value / func(initial.item(), t_list)).data

        if torch.abs(initial - previous_data) < torch.tensor(convergence):
            return initial
    return initial # return our final after iteration

# call starts
f = ExpModelTunable()
U = torch.rand(1, requires_grad=True)
initial_x = torch.tensor([.1], requires_grad=True) 
t_list = torch.tensor([0.0], requires_grad=True)
result = newton_method(pytorch_objective_tunable, func_f, initial_x, t_list)
print("Next Arrival at ", result.item())

This prints, the output is correct, all good here: Next Arrival at 4.500311374664307 . My problem occures here:

loss = result - torch.tensor(1)
loss.backward()
print( result.grad )
for param in f.parameters():
    print(param.grad)

output:

tensor([1.])
None #this should not be None
None #this should not be None

So we can see the result variable's gradient is updating, but the model f 's parameters' gradients aren't getting updated. I tried to go back through all the computation, all the code is here, and make sure any and everything has requires_grad=True but still I can't get it to work. This should work right? Anyone have any tips? Thanks.

There are a few issues with your code. Straight off you can tell if the model can at least initiate a backpropagation by looking at your output tensor:

>>> result
tensor([...], requires_grad=True)

It doesn't have a grad_fn , so you already know it's not connected to a graph.

Now for debugging the issues, here are some tips:

  • First, you should never mutate .data or use .item if you're planning on backpropagating. This will essentially kill the graph! As any operation performed after won't be attached to a graph.

  • You actually don't need to use requires_grad most of the time. Do note nn.Parameter will assign requires_grad=True to the tensor by default.

  • When working with list comprehensions inside your PyTorch pipeline, you can wrap the list with a torch.stack which is very effective to keep it tidy.

  • I wouldn't use a global if I was you...


Here is the corrected version:

class ExpModelTunable(nn.Module):
    def __init__(self):
        super(ExpModelTunable, self).__init__()
        self.alpha = nn.Parameter(torch.ones(1))
        self.beta = nn.Parameter(torch.ones(1))
    
    def forward(self, t):
        return self.alpha * torch.exp(-self.beta*t) 

f = ExpModelTunable()
def func_f(t, t_list):
    mu = torch.tensor(0.13191110355)
    running_sum = torch.stack([f(t-ti) for ti in t_list]).sum()
    return mu + running_sum

def pytorch_objective_tunable(u, t_list):
    global U
    steps = torch.linspace(t_list[-1].item(), u.item(), 100)
    func_values = torch.stack([func_f(steps[i], t_list) for i in range(len(steps))])
    return torch.log(U) + torch.trapz(func_values, steps)
    # return torch.trapz(func_values, steps)

def newton_method(function, func, initial, t_list, iteration=1, convergence=0.0001):
    for i in range(iteration): 
        previous_data = initial.clone()
        value = function(initial, t_list)
        initial -= (value / func(initial, t_list))

        if torch.abs(initial - previous_data) < torch.tensor(convergence):
            return initial
    return initial # return our final after iteration

U = torch.rand(1, requires_grad=True)
initial_x = torch.tensor([.1]) 
t_list = torch.tensor([0.0], requires_grad=True)
result = newton_method(pytorch_objective_tunable, func_f, initial_x, t_list)

Notice now the grad_fn attached to result :

>>> result
tensor([...], grad_fn=<SubBackward0>)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM