Meta-Learning

From BloomWiki
Jump to navigation Jump to search

How to read this page: This article maps the topic from beginner to expert across six levels � Remembering, Understanding, Applying, Analyzing, Evaluating, and Creating. Scan the headings to see the full scope, then read from wherever your knowledge starts to feel uncertain. Learn more about how BloomWiki works ?

Meta-learning, also called "learning to learn," is a subfield of machine learning focused on designing models and algorithms that improve their learning ability with experience. While standard machine learning trains a model to perform a specific task, meta-learning trains a model to learn new tasks quickly — often from very few examples. The goal is not to solve a particular problem, but to become a more efficient learner across a distribution of problems. Meta-learning underpins few-shot learning, rapid adaptation in robotics, and is increasingly applied to hyperparameter optimization and neural architecture search.

Remembering[edit]

  • Meta-learning — Learning to learn; training models to adapt quickly to new tasks using experience across many tasks.
  • Meta-learner — The higher-level model that learns across tasks and produces or adapts base learners.
  • Base learner — The model that is applied to individual tasks; updated by the meta-learner.
  • Support set — The small labeled dataset provided at test time to adapt to a new task (analogous to training data for the base learner).
  • Query set — The test examples for the new task on which performance is evaluated after adaptation.
  • N-way K-shot learning — A meta-learning task setting: N classes, K labeled examples per class in the support set.
  • Episode — One meta-learning training iteration, consisting of a sampled task with its support and query sets.
  • MAML (Model-Agnostic Meta-Learning) — A gradient-based meta-learning algorithm that finds initialization parameters enabling rapid fine-tuning on new tasks.
  • Prototypical Networks — A metric-based meta-learning approach that classifies by distance to class prototype embeddings.
  • Matching Networks — A metric-based approach using an attention mechanism over support set embeddings.
  • Meta-SGD — An extension of MAML that also meta-learns per-parameter learning rates.
  • In-context learning — The emergent ability of large LLMs to learn new tasks from examples provided in the prompt, without gradient updates.
  • Hyperparameter optimization (HPO) — Automatically finding optimal hyperparameters; meta-learning approaches (BOHB, SMAC) use experience across runs.

Understanding[edit]

Standard training gives a model a fixed behavior. Meta-learning gives a model the ability to **quickly adapt** its behavior given a few new examples.

    • The meta-learning objective**: Across many tasks T sampled from a distribution p(T), find model parameters θ that can quickly adapt to any task using only a few examples. Formally: min_θ E_{T~p(T)}[L_T(f_{θ'})] where θ' = Adapt(θ, support_set_T).
    • Three meta-learning approaches**:
    • Metric-based**: Learn an embedding space where classification is easy — similar examples are close, different ones are far. At test time, classify by distance to class prototypes (Prototypical Networks) or by weighted attention over support examples (Matching Networks).
    • Optimization-based (MAML)**: Find model initialization θ such that a few gradient steps on the support set produce a good model for the query set. The meta-update optimizes through the adaptation process — it literally backpropagates through gradient descent steps.
    • Model-based**: Use a recurrent or attention architecture that quickly updates its "memory" when shown support examples. The model's hidden state encodes the task context, enabling immediate adaptation.
    • In-context learning** (emergent in LLMs) is meta-learning without gradient updates: GPT-4 can learn to translate into a new language, write in a new style, or follow new formatting rules from just a few examples in the prompt. The model's weights don't change — it "adapts" purely through the attention mechanism reading the context.

Applying[edit]

MAML implementation for few-shot classification: <syntaxhighlight lang="python"> import torch import torch.nn as nn from copy import deepcopy

class MAML:

   def __init__(self, model, inner_lr=0.01, outer_lr=0.001, n_inner_steps=5):
       self.model = model
       self.inner_lr = inner_lr
       self.optimizer = torch.optim.Adam(model.parameters(), lr=outer_lr)
       self.n_inner_steps = n_inner_steps
       self.loss_fn = nn.CrossEntropyLoss()
   def inner_adapt(self, support_x, support_y):
       """Fast adaptation on support set (simulated fine-tuning)."""
       fast_model = deepcopy(self.model)
       fast_optimizer = torch.optim.SGD(fast_model.parameters(), lr=self.inner_lr)
       for _ in range(self.n_inner_steps):
           fast_optimizer.zero_grad()
           self.loss_fn(fast_model(support_x), support_y).backward()
           fast_optimizer.step()
       return fast_model
   def meta_update(self, episodes):
       """Outer loop: update θ to minimize query loss after adaptation."""
       meta_loss = 0.0
       for support_x, support_y, query_x, query_y in episodes:
           adapted = self.inner_adapt(support_x, support_y)
           meta_loss += self.loss_fn(adapted(query_x), query_y)
       self.optimizer.zero_grad()
       (meta_loss / len(episodes)).backward()
       self.optimizer.step()

</syntaxhighlight>

Meta-learning approach selection
Image classification, few-shot → Prototypical Networks (simple, effective)
Any architecture, gradient-based → MAML, FOMAML (first-order approximation)
NLP, few-shot → In-context learning with a large LLM
HPO automation → BOHB (Bayesian Optimization + Hyperband), Optuna
NAS → DARTS (gradient-based), evolutionary search

Analyzing[edit]

Meta-Learning Approach Comparison
Approach Requires Gradient Through Adapt Speed Generalization
Metric-based (Prototypical) No Very fast Good (within domain)
MAML Yes (expensive) Slow (2nd order) Good
FOMAML No (first-order approx) Moderate Good
In-context learning No (inference only) Fast Excellent (large LLMs)
Model-based (NTM) No Fast Moderate

Failure modes: MAML's second-order gradients are computationally expensive and numerically unstable. Metric-based methods fail when tasks are too diverse for a shared embedding space. In-context learning degrades when the context window fills or examples are poorly formatted. Meta-overfitting — the meta-learner overfits to the meta-training task distribution, failing on truly novel tasks.

Evaluating[edit]

Evaluation on standard benchmarks: **Omniglot** (20-way 1-shot, 5-shot character recognition), **miniImageNet** (5-way 1-shot, 5-shot), **Meta-Dataset** (diverse cross-domain few-shot). Report mean ± 95% CI across episodes. Expert practitioners evaluate generalization to task distributions outside meta-training — if a model only works on tasks similar to what it meta-trained on, it hasn't truly learned to learn.

Creating[edit]

Designing a meta-learning system for rapid domain adaptation: (1) Collect a large collection of diverse tasks from the target distribution (if supervised) or define task samplers. (2) For metric-based: train a shared encoder; at deployment, embed support set and classify by nearest prototype. (3) For MAML: use FOMAML or Reptile (simpler first-order approximation) for practical training. (4) For LLM-based: invest in prompt design and few-shot example selection — example quality matters more than quantity. (5) Continually collect new tasks at deployment and add them to the meta-training distribution to prevent drift.