Editing Fine-tuning Large Language Models (section)

== <span style="color: #FFFFFF;">Remembering</span> ==
* '''Pre-training''' — The initial phase where a model is trained on massive, general-purpose datasets to develop broad language capabilities. This is done once and is extremely expensive.
* '''Fine-tuning''' — Continuing training of a pre-trained model on a smaller dataset to specialize behavior. The model's weights are adjusted, typically starting from the pre-trained state.
* '''Supervised Fine-Tuning (SFT)''' — Fine-tuning on labeled input-output pairs, teaching the model to follow instructions or produce specific response formats.
* '''Instruction tuning''' — A form of SFT where the model is trained on instruction-following examples to make it more helpful and controllable.
* '''RLHF (Reinforcement Learning from Human Feedback)''' — A multi-stage process: SFT, then reward model training, then RL optimization — used to align model outputs with human preferences.
* '''LoRA (Low-Rank Adaptation)''' — A parameter-efficient fine-tuning technique that adds small trainable low-rank matrices to frozen base model weights, drastically reducing compute and memory requirements.
* '''QLoRA''' — LoRA applied to a quantized base model (typically 4-bit), enabling fine-tuning of large models on consumer GPUs.
* '''PEFT (Parameter-Efficient Fine-Tuning)''' — An umbrella term for methods like LoRA, Prefix Tuning, and Adapter layers that update only a small fraction of model parameters.
* '''Catastrophic forgetting''' — The tendency of a model to lose previously learned capabilities when trained extensively on new data.
* '''Learning rate''' — Typically much lower during fine-tuning than pre-training (e.g., 1e-5 to 2e-4) to avoid destroying pre-trained representations.
* '''Chat template''' — A structured format for instruction-tuned models defining how system prompts, user turns, and assistant turns are delimited.
* '''Prompt template''' — The format used to structure training examples, which must match the format used at inference time.
* '''Validation loss''' — The key metric monitored during fine-tuning to detect overfitting and determine when to stop.
</div>

<div style="background-color: #006400; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">