优化算法 ¶
约 94 个字
problem with GD - Slow at plateaus - get stuck at saddle points
Mini-batch Gradient Descent¶
Stochastic Gradient Descent | 随机梯度下降 ¶
Although this method is very fast, it may cause significant fluctuations in the loss function
Momentum¶
Gradient descent with momentum uses the momentum of the gradient for parameter optimization
Movement = Negative of Gradient + Momentum 优化算法之Gradient descent with momentum - 知乎
Nesterov Accelerated Momentum¶
比 Momentum 更快:揭开 Nesterov Accelerated Gradient 的真面目 - 知乎