Tag[autograd] Recent Newest Questions

How to compute the gradient of the output with respect to each input in pytorch

I have a tensor of shape (number_of rays, number_of_points_per_ray, 3), let’s call it input. input is passed through a model and some processing (all ...

How can I manually use gradients with PyTorch optimizers instead of autograd?

Autograd doesn't seem to be working reliably for a dataset I'm currently working with and I would like to use manually computed gradients with the Tor ...

Output 0 of DequantizeAndLinearBackward is a view and is being modified inplace. This view was created inside a custom Function and the autogrid

I am trying to fine-tune GPT J, but I have this error. I think it's related to the activation function and it's in-place but I don't know how to code ...

Pytorch autograd function backward is doesn't work ( which is output 0 of MmBackward, is at version 1; expected version 0 instead)

I'm making a model mixing Fine-tuning CLIP model & Freezing clip model. And I make a custom loss using kl_loss and CEE But when I have train mo ...

How do I use torch.profiler.profile without a context manager?

In the pytorch autograd profiler documentation, it says that the profiler is a "Context manager that manages autograd profiler state and holds a summa ...

How to "manually" apply your gradients in Pytorch?

what would be the equivalent in Pytorch of the following in tensorflow, where loss is the calculated loss in the iteration of the network and net is t ...

Inplace operation error in control problem

I'm new to pytorch and I'm having a problem with some code to train a a neural network to solve a control problem. I use the following code to solve a ...

I am looking for a comprehensive explanation of the `inputs` parameter of the `.backward()` method in PyTorch

I am having trouble understanding the usage of the inputs keyword in the .backward() call. The Documentation says the following: inputs (sequence ...

Can't fix torch autograd runtime error: UNet inplace operation

I can't fix the runtime error "one of the variables needed for gradient computation has been modified by an inplace operation. I know, that if I comm ...

Purpose of stop gradient in `jax.nn.softmax`?

jax.nn.softmax is defined as: def softmax(x: Array, axis: Optional[Union[int, Tuple[int, ...]]] = -1, where: Optional[Array] ...

Clarification in PyTorch's autograd with respect to tracking weights

I was reading this blog from PyTorch. Just before the AutoGrad in training Section , it is mentioned Be aware that only leaf nodes of the computat ...

How to implement a custom forward/backward function for torch.autograd.Function?

I would like to use pytorch to optimize a objective function which makes use of an operation that cannot be tracked by torch.autograd. I wrapped such ...

Can PyTorch L-BFGS be used to optimize a complex parameter?

A brief description of my model: Consists of a single parameter X of dtype ComplexDouble and shape (20, 20, 20, 3). For reference, this must be co ...

Cannot compute simple gradient of lambda function in JAX

I'm trying to compute the gradient of a lambda function that involves other gradients of functions, but the computation is hanging and I do not unders ...

How does automatic differentiation with respect to the input work?

I've been trying to understand how automatic differentiation (autodiff) works. There are several implementations of this that can be found in Tensorfl ...

Pytorch updating two models at the same time

I'm new to pytorch and had no luck following similar threads. I'm trying to jointly train two models in the same loop, and the model updates involve a ...

Error: "One of the differentiated Tensors appears to not have been used in the graph"

I am trying to compute a gradient of y_hat to x (y_hat is the sum of gradients of model output to x) but it gives me the error: One of the differentia ...

Performance gap between `batch_size==32` and `batch_size==8, gradient_accumulation==4`

I tried to use gradient accumulation in my project. To my understanding, the gradient accumulation is the same as increasing the batch size by x times ...

Why the grad is unavailable for the tensor in gpu

After executing codes, the a.grad is None although a.requires_grad is True. But if the code a = a.cuda() is removed, a.grad is available after the l ...

Custom torch.nn.Module not learning, even though grad_fn=MmBackward

I am training a model to predict pose using a custom Pytorch model. However, V1 below never learns (params don't change). The output is connected to t ...