Editing Optimization Algorithms in Machine Learning (section)

== <span style="color: #FFFFFF;">Remembering</span> ==
* '''Loss function''' — A function measuring the discrepancy between model predictions and true labels; the objective to minimize.
* '''Gradient''' — The vector of partial derivatives of the loss with respect to all model parameters; points in the direction of steepest ascent.
* '''Gradient descent''' — Iteratively moving parameters in the negative gradient direction to minimize the loss.
* '''Stochastic Gradient Descent (SGD)''' — Gradient descent using the gradient of a single random example (or mini-batch) per step.
* '''Mini-batch SGD''' — Using a small batch of examples to estimate the gradient; the standard training approach.
* '''Learning rate''' — The step size in gradient descent; too large causes divergence, too small causes slow convergence.
* '''Momentum''' — An acceleration technique that accumulates a velocity vector in the gradient direction, dampening oscillations.
* '''Adam (Adaptive Moment Estimation)''' — An optimizer combining momentum with adaptive per-parameter learning rates; the default choice for most deep learning.
* '''AdamW''' — Adam with decoupled weight decay regularization; standard for training transformers.
* '''Learning rate schedule''' — Varying the learning rate during training: warmup, cosine decay, step decay.
* '''Warmup''' — Gradually increasing learning rate from near-zero at the start of training to prevent instability.
* '''Weight decay (L2 regularization)''' — Adding a penalty proportional to the sum of squared weights, preventing overfitting.
* '''Gradient clipping''' — Capping gradient magnitude to prevent exploding gradients, especially in RNNs and transformers.
* '''Batch size''' — The number of examples per gradient update; affects gradient variance, memory, and training dynamics.
* '''Learning rate finder''' — A technique for selecting a good learning rate by increasing it gradually and monitoring loss.
</div>

<div style="background-color: #006400; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">