简体   繁体   中英

Gradient Descent Algorithm And Different Learning Rates

In the gradient descent algorithm, can we choose the learning rate to be different in each iteration of the algorithm until its convergence?

Yes, there are a variety of ways to set your hyperparameters according to epoch/iteration or loss-derivative functions. Changing the learning rate in gradient descent intuitively means changing the step size, with one tradeoff being large steps escape local optima but potentially requiring more steps to converge. Typically starting large and getting smaller makes sense, but there are many more optimized methods accelerating/regularizing the behavior of fit and learning rate scalar

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM