Meta Learning

How to read this page: This article maps the topic from beginner to expert across six levels � Remembering, Understanding, Applying, Analyzing, Evaluating, and Creating. Scan the headings to see the full scope, then read from wherever your knowledge starts to feel uncertain. Learn more about how BloomWiki works ?

Meta-learning, also called "learning to learn," is a subfield of machine learning focused on designing models and algorithms that improve their learning ability with experience. While standard machine learning trains a model to perform a specific task, meta-learning trains a model to learn new tasks quickly — often from very few examples. The goal is not to solve a particular problem, but to become a more efficient learner across a distribution of problems. Meta-learning underpins few-shot learning, rapid adaptation in robotics, and is increasingly applied to hyperparameter optimization and neural architecture search.

Remembering

Meta-learning — Learning to learn; training models to adapt quickly to new tasks using experience across many tasks.
Meta-learner — The higher-level model that learns across tasks and produces or adapts base learners.
Base learner — The model that is applied to individual tasks; updated by the meta-learner.
Support set — The small labeled dataset provided at test time to adapt to a new task (analogous to training data for the base learner).
Query set — The test examples for the new task on which performance is evaluated after adaptation.
N-way K-shot learning — A meta-learning task setting: N classes, K labeled examples per class in the support set.
Episode — One meta-learning training iteration, consisting of a sampled task with its support and query sets.
MAML (Model-Agnostic Meta-Learning) — A gradient-based meta-learning algorithm that finds initialization parameters enabling rapid fine-tuning on new tasks.
Prototypical Networks — A metric-based meta-learning approach that classifies by distance to class prototype embeddings.
Matching Networks — A metric-based approach using an attention mechanism over support set embeddings.
Meta-SGD — An extension of MAML that also meta-learns per-parameter learning rates.
In-context learning — The emergent ability of large LLMs to learn new tasks from examples provided in the prompt, without gradient updates.
Hyperparameter optimization (HPO) — Automatically finding optimal hyperparameters; meta-learning approaches (BOHB, SMAC) use experience across runs.

Understanding

Standard training gives a model a fixed behavior. Meta-learning gives a model the ability to quickly adapt its behavior given a few new examples.

The meta-learning objective: Across many tasks T sampled from a distribution p(T), find model parameters θ that can quickly adapt to any task using only a few examples. Formally: minθ E{T~p(T)}[LT(f{θ'})] where θ' = Adapt(θ, supportsetT).

Three meta-learning approaches:

Metric-based: Learn an embedding space where classification is easy — similar examples are close, different ones are far. At test time, classify by distance to class prototypes (Prototypical Networks) or by weighted attention over support examples (Matching Networks).

Optimization-based (MAML): Find model initialization θ such that a few gradient steps on the support set produce a good model for the query set. The meta-update optimizes through the adaptation process — it literally backpropagates through gradient descent steps.

Model-based: Use a recurrent or attention architecture that quickly updates its "memory" when shown support examples. The model's hidden state encodes the task context, enabling immediate adaptation.

In-context learning (emergent in LLMs) is meta-learning without gradient updates: GPT-4 can learn to translate into a new language, write in a new style, or follow new formatting rules from just a few examples in the prompt. The model's weights don't change — it "adapts" purely through the attention mechanism reading the context.

Applying

MAML implementation for few-shot classification: <syntaxhighlight lang="python"> import torch import torch.nn as nn from copy import deepcopy

class MAML:

   def __init__(self, model, inner_lr=0.01, outer_lr=0.001, n_inner_steps=5):
       self.model = model
       self.inner_lr = inner_lr
       self.optimizer = torch.optim.Adam(model.parameters(), lr=outer_lr)
       self.n_inner_steps = n_inner_steps
       self.loss_fn = nn.CrossEntropyLoss()

   def inner_adapt(self, support_x, support_y):
       """Fast adaptation on support set (simulated fine-tuning)."""
       fast_model = deepcopy(self.model)
       fast_optimizer = torch.optim.SGD(fast_model.parameters(), lr=self.inner_lr)
       for _ in range(self.n_inner_steps):
           fast_optimizer.zero_grad()
           self.loss_fn(fast_model(support_x), support_y).backward()
           fast_optimizer.step()
       return fast_model

   def meta_update(self, episodes):
       """Outer loop: update θ to minimize query loss after adaptation."""
       meta_loss = 0.0
       for support_x, support_y, query_x, query_y in episodes:
           adapted = self.inner_adapt(support_x, support_y)
           meta_loss += self.loss_fn(adapted(query_x), query_y)
       self.optimizer.zero_grad()
       (meta_loss / len(episodes)).backward()
       self.optimizer.step()

</syntaxhighlight>

Meta-learning approach selection: Image classification, few-shot → Prototypical Networks (simple, effective); Any architecture, gradient-based → MAML, FOMAML (first-order approximation); NLP, few-shot → In-context learning with a large LLM; HPO automation → BOHB (Bayesian Optimization + Hyperband), Optuna; NAS → DARTS (gradient-based), evolutionary search

Analyzing

Meta-Learning Approach Comparison
Approach	Requires Gradient Through Adapt	Speed	Generalization
Metric-based (Prototypical)	No	Very fast	Good (within domain)
MAML	Yes (expensive)	Slow (2nd order)	Good
FOMAML	No (first-order approx)	Moderate	Good
In-context learning	No (inference only)	Fast	Excellent (large LLMs)
Model-based (NTM)	No	Fast	Moderate

Failure modes: MAML's second-order gradients are computationally expensive and numerically unstable. Metric-based methods fail when tasks are too diverse for a shared embedding space. In-context learning degrades when the context window fills or examples are poorly formatted. Meta-overfitting — the meta-learner overfits to the meta-training task distribution, failing on truly novel tasks.

Evaluating

Evaluation on standard benchmarks: Omniglot (20-way 1-shot, 5-shot character recognition), miniImageNet (5-way 1-shot, 5-shot), Meta-Dataset (diverse cross-domain few-shot). Report mean ± 95% CI across episodes. Expert practitioners evaluate generalization to task distributions outside meta-training — if a model only works on tasks similar to what it meta-trained on, it hasn't truly learned to learn.

Creating

Designing a meta-learning system for rapid domain adaptation:

Collect a large collection of diverse tasks from the target distribution (if supervised) or define task samplers.
For metric-based: train a shared encoder; at deployment, embed support set and classify by nearest prototype.
For MAML: use FOMAML or Reptile (simpler first-order approximation) for practical training.
For LLM-based: invest in prompt design and few-shot example selection — example quality matters more than quantity.
Continually collect new tasks at deployment and add them to the meta-training distribution to prevent drift.

Meta Learning

Contents

Remembering

Understanding

Applying

Analyzing

Evaluating

Creating

Navigation menu

Meta Learning

Remembering

Understanding

Applying

Analyzing

Evaluating

Creating

Navigation menu

Search