Editing Meta Learning

<div style="background-color: #4B0082; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
{{BloomIntro}}
Meta-learning, also called "learning to learn," is a subfield of machine learning focused on designing models and algorithms that improve their learning ability with experience. While standard machine learning trains a model to perform a specific task, meta-learning trains a model to learn new tasks quickly — often from very few examples. The goal is not to solve a particular problem, but to become a more efficient learner across a distribution of problems. Meta-learning underpins few-shot learning, rapid adaptation in robotics, and is increasingly applied to hyperparameter optimization and neural architecture search.
</div>

__TOC__

<div style="background-color: #000080; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
== <span style="color: #FFFFFF;">Remembering</span> ==
* '''Meta-learning''' — Learning to learn; training models to adapt quickly to new tasks using experience across many tasks.
* '''Meta-learner''' — The higher-level model that learns across tasks and produces or adapts base learners.
* '''Base learner''' — The model that is applied to individual tasks; updated by the meta-learner.
* '''Support set''' — The small labeled dataset provided at test time to adapt to a new task (analogous to training data for the base learner).
* '''Query set''' — The test examples for the new task on which performance is evaluated after adaptation.
* '''N-way K-shot learning''' — A meta-learning task setting: N classes, K labeled examples per class in the support set.
* '''Episode''' — One meta-learning training iteration, consisting of a sampled task with its support and query sets.
* '''MAML (Model-Agnostic Meta-Learning)''' — A gradient-based meta-learning algorithm that finds initialization parameters enabling rapid fine-tuning on new tasks.
* '''Prototypical Networks''' — A metric-based meta-learning approach that classifies by distance to class prototype embeddings.
* '''Matching Networks''' — A metric-based approach using an attention mechanism over support set embeddings.
* '''Meta-SGD''' — An extension of MAML that also meta-learns per-parameter learning rates.
* '''In-context learning''' — The emergent ability of large LLMs to learn new tasks from examples provided in the prompt, without gradient updates.
* '''Hyperparameter optimization (HPO)''' — Automatically finding optimal hyperparameters; meta-learning approaches (BOHB, SMAC) use experience across runs.
</div>

<div style="background-color: #006400; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
== <span style="color: #FFFFFF;">Understanding</span> ==
Standard training gives a model a fixed behavior. Meta-learning gives a model the ability to '''quickly adapt''' its behavior given a few new examples.

'''The meta-learning objective''': Across many tasks T sampled from a distribution p(T), find model parameters θ that can quickly adapt to any task using only a few examples. Formally: min''θ E''{T~p(T)}[L''T(f''{θ'})] where θ' = Adapt(θ, support''set''T).

'''Three meta-learning approaches''':

'''Metric-based''': Learn an embedding space where classification is easy — similar examples are close, different ones are far. At test time, classify by distance to class prototypes (Prototypical Networks) or by weighted attention over support examples (Matching Networks).

'''Optimization-based (MAML)''': Find model initialization θ such that a few gradient steps on the support set produce a good model for the query set. The meta-update optimizes through the adaptation process — it literally backpropagates through gradient descent steps.

'''Model-based''': Use a recurrent or attention architecture that quickly updates its "memory" when shown support examples. The model's hidden state encodes the task context, enabling immediate adaptation.

'''In-context learning''' (emergent in LLMs) is meta-learning without gradient updates: GPT-4 can learn to translate into a new language, write in a new style, or follow new formatting rules from just a few examples in the prompt. The model's weights don't change — it "adapts" purely through the attention mechanism reading the context.
</div>

<div style="background-color: #8B0000; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
== <span style="color: #FFFFFF;">Applying</span> ==
'''MAML implementation for few-shot classification:'''
<syntaxhighlight lang="python">
import torch
import torch.nn as nn
from copy import deepcopy

class MAML:
    def __init__(self, model, inner_lr=0.01, outer_lr=0.001, n_inner_steps=5):
        self.model = model
        self.inner_lr = inner_lr
        self.optimizer = torch.optim.Adam(model.parameters(), lr=outer_lr)
        self.n_inner_steps = n_inner_steps
        self.loss_fn = nn.CrossEntropyLoss()

    def inner_adapt(self, support_x, support_y):
        """Fast adaptation on support set (simulated fine-tuning)."""
        fast_model = deepcopy(self.model)
        fast_optimizer = torch.optim.SGD(fast_model.parameters(), lr=self.inner_lr)
        for _ in range(self.n_inner_steps):
            fast_optimizer.zero_grad()
            self.loss_fn(fast_model(support_x), support_y).backward()
            fast_optimizer.step()
        return fast_model

    def meta_update(self, episodes):
        """Outer loop: update θ to minimize query loss after adaptation."""
        meta_loss = 0.0
        for support_x, support_y, query_x, query_y in episodes:
            adapted = self.inner_adapt(support_x, support_y)
            meta_loss += self.loss_fn(adapted(query_x), query_y)
        self.optimizer.zero_grad()
        (meta_loss / len(episodes)).backward()
        self.optimizer.step()
</syntaxhighlight>

; Meta-learning approach selection
: '''Image classification, few-shot''' → Prototypical Networks (simple, effective)
: '''Any architecture, gradient-based''' → MAML, FOMAML (first-order approximation)
: '''NLP, few-shot''' → In-context learning with a large LLM
: '''HPO automation''' → BOHB (Bayesian Optimization + Hyperband), Optuna
: '''NAS''' → DARTS (gradient-based), evolutionary search
</div>

<div style="background-color: #8B4500; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
== <span style="color: #FFFFFF;">Analyzing</span> ==
{| class="wikitable"
|+ Meta-Learning Approach Comparison
! Approach !! Requires Gradient Through Adapt !! Speed !! Generalization
|-
| Metric-based (Prototypical) || No || Very fast || Good (within domain)
|-
| MAML || Yes (expensive) || Slow (2nd order) || Good
|-
| FOMAML || No (first-order approx) || Moderate || Good
|-
| In-context learning || No (inference only) || Fast || Excellent (large LLMs)
|-
| Model-based (NTM) || No || Fast || Moderate
|}

'''Failure modes''': MAML's second-order gradients are computationally expensive and numerically unstable. Metric-based methods fail when tasks are too diverse for a shared embedding space. In-context learning degrades when the context window fills or examples are poorly formatted. Meta-overfitting — the meta-learner overfits to the meta-training task distribution, failing on truly novel tasks.
</div>

<div style="background-color: #483D8B; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
== <span style="color: #FFFFFF;">Evaluating</span> ==
Evaluation on standard benchmarks: '''Omniglot''' (20-way 1-shot, 5-shot character recognition), '''miniImageNet''' (5-way 1-shot, 5-shot), '''Meta-Dataset''' (diverse cross-domain few-shot). Report mean ± 95% CI across episodes. Expert practitioners evaluate generalization to task distributions outside meta-training — if a model only works on tasks similar to what it meta-trained on, it hasn't truly learned to learn.
</div>

<div style="background-color: #2F4F4F; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
== <span style="color: #FFFFFF;">Creating</span> ==
Designing a meta-learning system for rapid domain adaptation:
# Collect a large collection of diverse tasks from the target distribution (if supervised) or define task samplers.
# For metric-based: train a shared encoder; at deployment, embed support set and classify by nearest prototype.
# For MAML: use FOMAML or Reptile (simpler first-order approximation) for practical training.
# For LLM-based: invest in prompt design and few-shot example selection — example quality matters more than quantity.
# Continually collect new tasks at deployment and add them to the meta-training distribution to prevent drift.

[[Category:Artificial Intelligence]]
[[Category:Machine Learning]]
[[Category:Meta-Learning]]
</div>