跳转至

优化算法

94 个字

problem with GD - Slow at plateaus - get stuck at saddle points

Mini-batch Gradient Descent

Stochastic Gradient Descent | 随机梯度下降

Although this method is very fast, it may cause significant fluctuations in the loss function

Momentum

Gradient descent with momentum uses the momentum of the gradient for parameter optimization

Movement = Negative of Gradient + Momentum 优化算法之Gradient descent with momentum - 知乎

Nesterov Accelerated Momentum

Momentum 更快:揭开 Nesterov Accelerated Gradient 的真面目 - 知乎

Adam | Adaptive Moment Estimation