简体繁体中英

how is the optmization done when we use zero_grad() in PyTorch?

原文 2022-02-20 23:37:27 9 1 python/ optimization/ deep-learning/ pytorch/ gradient-descent

zero_grad() method is used when we want to "conserve" RAM with massive datasets. There was already an answer on that, here: Why do we need to call zero_grad() in PyTorch? .

Gradients are used for the update of the parameters during back prop. But if we delete the gradients by setting them at 0, how can the optimization be done during the backward propagation? There are models where we use this method and there is still an optimization that is occurring, how is this possible?

1 answers

You don't "delete the gradients", you simply clear the cache of gradients from previous iteration. The reason of existence of this cache is ease of implementation of specific methods such as simulation of big batch without memory to actually use the whole batch.

Why do we need to call zero_grad() in PyTorch?

How to solve AttributeError: 'Tensor' object has no attribute 'zero_grad' in pytorch

What is the difference between `.zero_grad()` and `.zero_grad`?

Shall I use grad.zero_() in PyTorch with or without gradient tracking?

pytorch how to set .requires_grad False

Pytorch : How .grad() function returning result?

pytorch how to compute grad after clone a tensor

pytorch sets grad attribute to none if I use simple minus instead of -=

How retain_grad() in pytorch works? I found its position changes the grad result

Pytorch: How to get all model's parameters that require grad?

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Why do we need to call zero_grad() in PyTorch? How to solve AttributeError: 'Tensor' object has no attribute 'zero_grad' in pytorch What is the difference between `.zero_grad()` and `.zero_grad`? Shall I use grad.zero_() in PyTorch with or without gradient tracking? pytorch how to set .requires_grad False Pytorch : How .grad() function returning result? pytorch how to compute grad after clone a tensor pytorch sets grad attribute to none if I use simple minus instead of -= How retain_grad() in pytorch works? I found its position changes the grad result Pytorch: How to get all model's parameters that require grad?

Related Tags

how is the optmization done when we use zero_grad() in PyTorch?

Question

1 answers

solution1 0 2022-02-20 23:47:51

solution1
0 2022-02-20 23:47:51