Editing Optimization Algorithms in Machine Learning (section)

== <span style="color: #FFFFFF;">Creating</span> ==
Designing an optimization strategy for a new model: (1) Default: AdamW, lr=3e-4 (or 1e-3 for smaller models), betas=(0.9, 0.999), weight_decay=0.01–0.1. (2) Schedule: linear warmup (5–10% of total steps) + cosine decay to 0. (3) Clip gradients at max_norm=1.0 for stability. (4) Batch size: start with the maximum that fits in GPU memory; scale LR linearly if changing batch size. (5) Learning rate finder: use pytorch-lightning's LR finder to get a good initial estimate. (6) Monitor: W&B or TensorBoard; alert if loss becomes NaN or gradient norm explodes.

[[Category:Artificial Intelligence]]
[[Category:Machine Learning]]
[[Category:Optimization]]
</div>