简体   繁体   English

什么时候在 Tensorflow Gradient Tape 中应用 Momentum?

[英]When is Momentum Applied in Tensorflow Gradient Tape?

I've been playing around with automatic gradients in tensorflow and I had a question.我一直在玩 tensorflow 中的自动梯度,我有一个问题。 If we are updating an optimizer, say ADAM, when is the momentum algorithm applied to the gradient?如果我们正在更新优化器,比如 ADAM,那么动量算法何时应用于梯度? Is it applied when we call tape.gradient(loss,model.trainable_variables) or when we call model.optimizer.apply_gradients(zip(dtf_network,model.trainable_variables))?它是在我们调用 tape.gradient(loss,model.trainable_variables) 还是调用 model.optimizer.apply_gradients(zip(dtf_network,model.trainable_variables)) 时应用的?

Thanks!谢谢!

tape.gradient computes the gradients straightforwardly without reference to an optimizer. tape.gradient计算梯度,无需参考优化器。 Since momentum is part of the optimizer, the tape does not include it.由于势头优化的一部分,带包括它。 AFAIK momentum is usually implemented by adding extra variables in the optimizer that store the running average. AFAIK 动量通常是通过在存储运行平均值的优化器中添加额外变量来实现的。 All of this is handled in optimizer.apply_gradients .所有这些都在optimizer.apply_gradients处理。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM