Editing Finetuning Llms

<div style="background-color: #4B0082; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
{{BloomIntro}}
Fine-tuning is the process of taking a large language model (LLM) that has already been pre-trained on a vast corpus and continuing its training on a smaller, task-specific dataset to specialize its capabilities. It is one of the most powerful techniques in practical AI deployment, enabling organizations to adapt frontier models to domain-specific language, formats, reasoning styles, or behaviors — often with only thousands of examples. Fine-tuning sits at the intersection of deep learning theory and production engineering.
</div>

__TOC__

<div style="background-color: #000080; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
== <span style="color: #FFFFFF;">Remembering</span> ==
* '''Pre-training''' — The initial phase where a model is trained on massive, general-purpose datasets to develop broad language capabilities. This is done once and is extremely expensive.
* '''Fine-tuning''' — Continuing training of a pre-trained model on a smaller dataset to specialize behavior. The model's weights are adjusted, typically starting from the pre-trained state.
* '''Supervised Fine-Tuning (SFT)''' — Fine-tuning on labeled input-output pairs, teaching the model to follow instructions or produce specific response formats.
* '''Instruction tuning''' — A form of SFT where the model is trained on instruction-following examples to make it more helpful and controllable.
* '''RLHF (Reinforcement Learning from Human Feedback)''' — A multi-stage process: SFT, then reward model training, then RL optimization — used to align model outputs with human preferences.
* '''LoRA (Low-Rank Adaptation)''' — A parameter-efficient fine-tuning technique that adds small trainable low-rank matrices to frozen base model weights, drastically reducing compute and memory requirements.
* '''QLoRA''' — LoRA applied to a quantized base model (typically 4-bit), enabling fine-tuning of large models on consumer GPUs.
* '''PEFT (Parameter-Efficient Fine-Tuning)''' — An umbrella term for methods like LoRA, Prefix Tuning, and Adapter layers that update only a small fraction of model parameters.
* '''Catastrophic forgetting''' — The tendency of a model to lose previously learned capabilities when trained extensively on new data.
* '''Learning rate''' — Typically much lower during fine-tuning than pre-training (e.g., 1e-5 to 2e-4) to avoid destroying pre-trained representations.
* '''Chat template''' — A structured format for instruction-tuned models defining how system prompts, user turns, and assistant turns are delimited.
* '''Prompt template''' — The format used to structure training examples, which must match the format used at inference time.
* '''Validation loss''' — The key metric monitored during fine-tuning to detect overfitting and determine when to stop.
</div>

<div style="background-color: #006400; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
== <span style="color: #FFFFFF;">Understanding</span> ==
Fine-tuning works because pre-trained LLMs have already learned rich representations of language, facts, and reasoning patterns. Fine-tuning doesn't teach the model new knowledge so much as it '''reconfigures how the model accesses and expresses what it already knows'''.

Analogy: A pre-trained LLM is like a broadly educated graduate. Fine-tuning is like a specialized internship — they don't forget everything they learned in university; they learn how to apply their knowledge in a specific context, following specific conventions and communicating in specific ways.

'''Full fine-tuning''' updates all model parameters. It is most powerful but requires enormous compute (multiple GPUs, hours to days) and is prone to catastrophic forgetting of general capabilities.

'''LoRA''' (Low-Rank Adaptation) is the dominant technique in practice. It freezes the original weights and adds small trainable matrices A and B to each attention layer such that the effective weight update is W + ΔW = W + AB, where A is d×r and B is r×d, with rank r ≪ d. With r=16, a 7B model might add only ~20M trainable parameters (0.3% of total). This dramatically reduces compute, memory, and overfitting risk.

The '''data format''' matters enormously. Fine-tuning teaches the model a specific input-output pattern. If training examples don't precisely match the inference format (including chat templates, special tokens, and prompt structures), the model will underperform.
</div>

<div style="background-color: #8B0000; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
== <span style="color: #FFFFFF;">Applying</span> ==
'''LoRA fine-tuning with HuggingFace + PEFT:'''

<syntaxhighlight lang="python">
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments
from peft import LoraConfig, get_peft_model, TaskType
from trl import SFTTrainer
import datasets

# Load base model (quantized for efficiency)
model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-2-7b-hf",
    load_in_4bit=True,      # QLoRA: quantize to 4-bit
    device_map="auto"
)

# LoRA configuration
lora_config = LoraConfig(
    r=16,                           # Rank
    lora_alpha=32,                  # Scaling factor
    target_modules=["q_proj", "v_proj"],  # Which layers to adapt
    lora_dropout=0.05,
    bias="none",
    task_type=TaskType.CAUSAL_LM
)

model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
# trainable params: 4,194,304 || all params: 6,742,609,920 || trainable%: 0.06%

# Training setup
training_args = TrainingArguments(
    output_dir="./finetuned_model",
    num_train_epochs=3,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    learning_rate=2e-4,
    fp16=True,
    save_steps=100,
    logging_steps=25,
)

# Dataset: each sample has "text" field with full formatted prompt+response
dataset = datasets.load_dataset("json", data_files="train.jsonl")["train"]

trainer = SFTTrainer(
    model=model,
    args=training_args,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=2048,
)
trainer.train()
</syntaxhighlight>

; Data format for instruction tuning (Llama chat template)
: '''System''' → Defines the model's role and constraints
: '''User turn''' → The instruction or question
: '''Assistant turn''' → The desired response (what the model learns to produce)
: '''Special tokens''' → [INST], [/INST], <<SYS>> etc. must exactly match the model's chat template
</div>

<div style="background-color: #8B4500; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
== <span style="color: #FFFFFF;">Analyzing</span> ==
{| class="wikitable"
|+ Fine-tuning Method Comparison
! Method !! Params Updated !! GPU Memory !! Risk of Forgetting !! Quality
|-
| Full fine-tuning || 100% || Very high (multiple GPUs) || High || Highest
|-
| LoRA || 0.1–1% || Low (1 GPU possible) || Low || Near-full for most tasks
|-
| QLoRA || 0.1–1% (on 4-bit model) || Very low (fits on 24GB GPU) || Low || Slightly below LoRA
|-
| Prefix tuning || ~0.1% || Low || Very low || Moderate
|-
| Prompt tuning || ~0.01% || Very low || Very low || Lower than LoRA
|}

'''Failure modes:'''
* '''Overfitting on small datasets''' — With <500 examples, the model can memorize rather than generalize. Monitor validation loss; stop early.
* '''Format mismatch''' — Training on incorrectly formatted examples causes the model to generate malformed outputs or include spurious tokens.
* '''Instruction following collapse''' — Aggressive fine-tuning can make the model rigid, losing the flexibility to handle instructions it wasn't trained on.
* '''Reward hacking (RLHF)''' — The model learns to produce responses that score well according to the reward model without actually being more helpful — for example, becoming verbose without substance.
* '''Capability regression''' — Fine-tuning on a narrow task can degrade performance on other tasks. Evaluate on a broad benchmark before and after.
</div>

<div style="background-color: #483D8B; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
== <span style="color: #FFFFFF;">Evaluating</span> ==
Expert practitioners treat fine-tuning evaluation as multi-dimensional:

'''Task-specific metrics''': Whatever the downstream task demands — ROUGE for summarization, exact match for QA, pass@k for code generation, human preference rates for chat.

'''General capability retention''': Run the fine-tuned model on standard benchmarks (MMLU, HellaSwag, HumanEval) to verify general capabilities weren't degraded. A model fine-tuned for customer service shouldn't lose its ability to reason.

'''Alignment and safety evaluation''': Does fine-tuning introduce new failure modes? Run adversarial prompts, jailbreak attempts, and harmful content evaluations on the fine-tuned model.

'''Human preference evaluation (A/B testing)''': For conversational models, human raters compare base model vs. fine-tuned model outputs on real user queries. This is the ground truth for whether fine-tuning achieved its goal.

Expert practitioners maintain a '''regression test suite''' — a fixed set of prompts with expected behaviors — and run it after every fine-tuning run to catch regressions automatically.
</div>

<div style="background-color: #2F4F4F; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
== <span style="color: #FFFFFF;">Creating</span> ==
Designing a full fine-tuning pipeline:

'''1. Dataset curation (most important step)'''
<syntaxhighlight lang="text">
Source data collection (domain documents, logs, demonstrations)
    ↓
Quality filtering (deduplication, length filtering, toxic content removal)
    ↓
Formatting (convert to chat template, add system prompt)
    ↓
Review sample (manually inspect 100+ examples)
    ↓
Train/validation split (90/10 or 95/5)
</syntaxhighlight>

'''2. Training configuration decision tree'''
* <1k examples and 1 GPU → QLoRA with early stopping
* 1k–100k examples and 2–8 GPUs → LoRA with gradient checkpointing
* >100k examples and production budget → Full fine-tune with DDP/FSDP

'''3. Iterative refinement loop'''
<syntaxhighlight lang="text">
v1: SFT on demonstrations
    ↓ evaluate → identify failure cases
v2: Add failure case examples to dataset, retrain
    ↓ evaluate → identify preference gaps
v3: Collect human preference data → train reward model → PPO/DPO fine-tune
</syntaxhighlight>

'''4. Serving the fine-tuned model'''
* Merge LoRA adapters into base model: <code>model.merge''and''unload()</code>
* Export to GGUF format for llama.cpp (local/edge deployment)
* Push to HuggingFace Hub or deploy with vLLM for API serving

[[Category:Artificial Intelligence]]
[[Category:Large Language Models]]
[[Category:Machine Learning]]
</div>