优化算法 ¶

约 94 个字

problem with GD - Slow at plateaus - get stuck at saddle points

Mini-batch Gradient Descent¶

Stochastic Gradient Descent | 随机梯度下降 ¶

Although this method is very fast, it may cause significant fluctuations in the loss function

Momentum¶

Gradient descent with momentum uses the momentum of the gradient for parameter optimization

Movement = Negative of Gradient + Momentum 优化算法之Gradient descent with momentum - 知乎

Nesterov Accelerated Momentum¶

比 Momentum 更快：揭开 Nesterov Accelerated Gradient 的真面目 - 知乎

Adam | Adaptive Moment Estimation¶