Meta-Learning - Revision history

Wordpad: BloomWiki: Meta-Learning

2026-04-25T01:53:55Z

BloomWiki: Meta-Learning

← Older revision		Revision as of 01:53, 25 April 2026
Line 1:		Line 1:
			<div style="background-color: #4B0082; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
	{{BloomIntro}}		{{BloomIntro}}
	Meta-learning, also called "learning to learn," is a subfield of machine learning focused on designing models and algorithms that improve their learning ability with experience. While standard machine learning trains a model to perform a specific task, meta-learning trains a model to learn new tasks quickly — often from very few examples. The goal is not to solve a particular problem, but to become a more efficient learner across a distribution of problems. Meta-learning underpins few-shot learning, rapid adaptation in robotics, and is increasingly applied to hyperparameter optimization and neural architecture search.		Meta-learning, also called "learning to learn," is a subfield of machine learning focused on designing models and algorithms that improve their learning ability with experience. While standard machine learning trains a model to perform a specific task, meta-learning trains a model to learn new tasks quickly — often from very few examples. The goal is not to solve a particular problem, but to become a more efficient learner across a distribution of problems. Meta-learning underpins few-shot learning, rapid adaptation in robotics, and is increasingly applied to hyperparameter optimization and neural architecture search.
			</div>

	== Remembering ==		__TOC__

			<div style="background-color: #000080; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
			== <span style="color: #FFFFFF;">Remembering</span> ==
	* '''Meta-learning''' — Learning to learn; training models to adapt quickly to new tasks using experience across many tasks.		* '''Meta-learning''' — Learning to learn; training models to adapt quickly to new tasks using experience across many tasks.
	* '''Meta-learner''' — The higher-level model that learns across tasks and produces or adapts base learners.		* '''Meta-learner''' — The higher-level model that learns across tasks and produces or adapts base learners.
Line 16:		Line 21:
	* '''In-context learning''' — The emergent ability of large LLMs to learn new tasks from examples provided in the prompt, without gradient updates.		* '''In-context learning''' — The emergent ability of large LLMs to learn new tasks from examples provided in the prompt, without gradient updates.
	* '''Hyperparameter optimization (HPO)''' — Automatically finding optimal hyperparameters; meta-learning approaches (BOHB, SMAC) use experience across runs.		* '''Hyperparameter optimization (HPO)''' — Automatically finding optimal hyperparameters; meta-learning approaches (BOHB, SMAC) use experience across runs.
			</div>

	== Understanding ==		<div style="background-color: #006400; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
			== <span style="color: #FFFFFF;">Understanding</span> ==
	Standard training gives a model a fixed behavior. Meta-learning gives a model the ability to quickly adapt its behavior given a few new examples.		Standard training gives a model a fixed behavior. Meta-learning gives a model the ability to quickly adapt its behavior given a few new examples.

Line 31:		Line 38:

	In-context learning (emergent in LLMs) is meta-learning without gradient updates: GPT-4 can learn to translate into a new language, write in a new style, or follow new formatting rules from just a few examples in the prompt. The model's weights don't change — it "adapts" purely through the attention mechanism reading the context.		In-context learning (emergent in LLMs) is meta-learning without gradient updates: GPT-4 can learn to translate into a new language, write in a new style, or follow new formatting rules from just a few examples in the prompt. The model's weights don't change — it "adapts" purely through the attention mechanism reading the context.
			</div>

	== Applying ==		<div style="background-color: #8B0000; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
			== <span style="color: #FFFFFF;">Applying</span> ==
	'''MAML implementation for few-shot classification:'''		'''MAML implementation for few-shot classification:'''
	<syntaxhighlight lang="python">		<syntaxhighlight lang="python">
Line 74:		Line 83:
	: '''HPO automation''' → BOHB (Bayesian Optimization + Hyperband), Optuna		: '''HPO automation''' → BOHB (Bayesian Optimization + Hyperband), Optuna
	: '''NAS''' → DARTS (gradient-based), evolutionary search		: '''NAS''' → DARTS (gradient-based), evolutionary search
			</div>

	== Analyzing ==		<div style="background-color: #8B4500; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
			== <span style="color: #FFFFFF;">Analyzing</span> ==
	{\| class="wikitable"		{\| class="wikitable"
	\|+ Meta-Learning Approach Comparison		\|+ Meta-Learning Approach Comparison
Line 92:		Line 103:

	'''Failure modes''': MAML's second-order gradients are computationally expensive and numerically unstable. Metric-based methods fail when tasks are too diverse for a shared embedding space. In-context learning degrades when the context window fills or examples are poorly formatted. Meta-overfitting — the meta-learner overfits to the meta-training task distribution, failing on truly novel tasks.		'''Failure modes''': MAML's second-order gradients are computationally expensive and numerically unstable. Metric-based methods fail when tasks are too diverse for a shared embedding space. In-context learning degrades when the context window fills or examples are poorly formatted. Meta-overfitting — the meta-learner overfits to the meta-training task distribution, failing on truly novel tasks.
			</div>

	== Evaluating ==		<div style="background-color: #483D8B; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
			== <span style="color: #FFFFFF;">Evaluating</span> ==
	Evaluation on standard benchmarks: Omniglot (20-way 1-shot, 5-shot character recognition), miniImageNet (5-way 1-shot, 5-shot), Meta-Dataset (diverse cross-domain few-shot). Report mean ± 95% CI across episodes. Expert practitioners evaluate generalization to task distributions outside meta-training — if a model only works on tasks similar to what it meta-trained on, it hasn't truly learned to learn.		Evaluation on standard benchmarks: Omniglot (20-way 1-shot, 5-shot character recognition), miniImageNet (5-way 1-shot, 5-shot), Meta-Dataset (diverse cross-domain few-shot). Report mean ± 95% CI across episodes. Expert practitioners evaluate generalization to task distributions outside meta-training — if a model only works on tasks similar to what it meta-trained on, it hasn't truly learned to learn.
			</div>

	== Creating ==		<div style="background-color: #2F4F4F; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
			== <span style="color: #FFFFFF;">Creating</span> ==
	Designing a meta-learning system for rapid domain adaptation: (1) Collect a large collection of diverse tasks from the target distribution (if supervised) or define task samplers. (2) For metric-based: train a shared encoder; at deployment, embed support set and classify by nearest prototype. (3) For MAML: use FOMAML or Reptile (simpler first-order approximation) for practical training. (4) For LLM-based: invest in prompt design and few-shot example selection — example quality matters more than quantity. (5) Continually collect new tasks at deployment and add them to the meta-training distribution to prevent drift.		Designing a meta-learning system for rapid domain adaptation: (1) Collect a large collection of diverse tasks from the target distribution (if supervised) or define task samplers. (2) For metric-based: train a shared encoder; at deployment, embed support set and classify by nearest prototype. (3) For MAML: use FOMAML or Reptile (simpler first-order approximation) for practical training. (4) For LLM-based: invest in prompt design and few-shot example selection — example quality matters more than quantity. (5) Continually collect new tasks at deployment and add them to the meta-training distribution to prevent drift.

Line 102:		Line 117:
	[[Category:Machine Learning]]		[[Category:Machine Learning]]
	[[Category:Meta-Learning]]		[[Category:Meta-Learning]]
			</div>

Wordpad: New BloomWiki article: Meta-Learning

2026-04-23T06:46:07Z

New BloomWiki article: Meta-Learning

New page

{{BloomIntro}}
Meta-learning, also called "learning to learn," is a subfield of machine learning focused on designing models and algorithms that improve their learning ability with experience. While standard machine learning trains a model to perform a specific task, meta-learning trains a model to learn new tasks quickly — often from very few examples. The goal is not to solve a particular problem, but to become a more efficient learner across a distribution of problems. Meta-learning underpins few-shot learning, rapid adaptation in robotics, and is increasingly applied to hyperparameter optimization and neural architecture search.

== Remembering ==
* '''Meta-learning''' — Learning to learn; training models to adapt quickly to new tasks using experience across many tasks.
* '''Meta-learner''' — The higher-level model that learns across tasks and produces or adapts base learners.
* '''Base learner''' — The model that is applied to individual tasks; updated by the meta-learner.
* '''Support set''' — The small labeled dataset provided at test time to adapt to a new task (analogous to training data for the base learner).
* '''Query set''' — The test examples for the new task on which performance is evaluated after adaptation.
* '''N-way K-shot learning''' — A meta-learning task setting: N classes, K labeled examples per class in the support set.
* '''Episode''' — One meta-learning training iteration, consisting of a sampled task with its support and query sets.
* '''MAML (Model-Agnostic Meta-Learning)''' — A gradient-based meta-learning algorithm that finds initialization parameters enabling rapid fine-tuning on new tasks.
* '''Prototypical Networks''' — A metric-based meta-learning approach that classifies by distance to class prototype embeddings.
* '''Matching Networks''' — A metric-based approach using an attention mechanism over support set embeddings.
* '''Meta-SGD''' — An extension of MAML that also meta-learns per-parameter learning rates.
* '''In-context learning''' — The emergent ability of large LLMs to learn new tasks from examples provided in the prompt, without gradient updates.
* '''Hyperparameter optimization (HPO)''' — Automatically finding optimal hyperparameters; meta-learning approaches (BOHB, SMAC) use experience across runs.

== Understanding ==
Standard training gives a model a fixed behavior. Meta-learning gives a model the ability to **quickly adapt** its behavior given a few new examples.

**The meta-learning objective**: Across many tasks T sampled from a distribution p(T), find model parameters θ that can quickly adapt to any task using only a few examples. Formally: min_θ E_{T~p(T)}[L_T(f_{θ'})] where θ' = Adapt(θ, support_set_T).

**Three meta-learning approaches**:

**Metric-based**: Learn an embedding space where classification is easy — similar examples are close, different ones are far. At test time, classify by distance to class prototypes (Prototypical Networks) or by weighted attention over support examples (Matching Networks).

**Optimization-based (MAML)**: Find model initialization θ such that a few gradient steps on the support set produce a good model for the query set. The meta-update optimizes through the adaptation process — it literally backpropagates through gradient descent steps.

**Model-based**: Use a recurrent or attention architecture that quickly updates its "memory" when shown support examples. The model's hidden state encodes the task context, enabling immediate adaptation.

**In-context learning** (emergent in LLMs) is meta-learning without gradient updates: GPT-4 can learn to translate into a new language, write in a new style, or follow new formatting rules from just a few examples in the prompt. The model's weights don't change — it "adapts" purely through the attention mechanism reading the context.

== Applying ==
'''MAML implementation for few-shot classification:'''
<syntaxhighlight lang="python">
import torch
import torch.nn as nn
from copy import deepcopy

class MAML:
def __init__(self, model, inner_lr=0.01, outer_lr=0.001, n_inner_steps=5):
self.model = model
self.inner_lr = inner_lr
self.optimizer = torch.optim.Adam(model.parameters(), lr=outer_lr)
self.n_inner_steps = n_inner_steps
self.loss_fn = nn.CrossEntropyLoss()

def inner_adapt(self, support_x, support_y):
"""Fast adaptation on support set (simulated fine-tuning)."""
fast_model = deepcopy(self.model)
fast_optimizer = torch.optim.SGD(fast_model.parameters(), lr=self.inner_lr)
for _ in range(self.n_inner_steps):
fast_optimizer.zero_grad()
self.loss_fn(fast_model(support_x), support_y).backward()
fast_optimizer.step()
return fast_model

def meta_update(self, episodes):
"""Outer loop: update θ to minimize query loss after adaptation."""
meta_loss = 0.0
for support_x, support_y, query_x, query_y in episodes:
adapted = self.inner_adapt(support_x, support_y)
meta_loss += self.loss_fn(adapted(query_x), query_y)
self.optimizer.zero_grad()
(meta_loss / len(episodes)).backward()
self.optimizer.step()
</syntaxhighlight>

; Meta-learning approach selection
: '''Image classification, few-shot''' → Prototypical Networks (simple, effective)
: '''Any architecture, gradient-based''' → MAML, FOMAML (first-order approximation)
: '''NLP, few-shot''' → In-context learning with a large LLM
: '''HPO automation''' → BOHB (Bayesian Optimization + Hyperband), Optuna
: '''NAS''' → DARTS (gradient-based), evolutionary search

== Analyzing ==
{| class="wikitable"
|+ Meta-Learning Approach Comparison
! Approach !! Requires Gradient Through Adapt !! Speed !! Generalization
|-
| Metric-based (Prototypical) || No || Very fast || Good (within domain)
|-
| MAML || Yes (expensive) || Slow (2nd order) || Good
|-
| FOMAML || No (first-order approx) || Moderate || Good
|-
| In-context learning || No (inference only) || Fast || Excellent (large LLMs)
|-
| Model-based (NTM) || No || Fast || Moderate
|}

'''Failure modes''': MAML's second-order gradients are computationally expensive and numerically unstable. Metric-based methods fail when tasks are too diverse for a shared embedding space. In-context learning degrades when the context window fills or examples are poorly formatted. Meta-overfitting — the meta-learner overfits to the meta-training task distribution, failing on truly novel tasks.

== Evaluating ==
Evaluation on standard benchmarks: **Omniglot** (20-way 1-shot, 5-shot character recognition), **miniImageNet** (5-way 1-shot, 5-shot), **Meta-Dataset** (diverse cross-domain few-shot). Report mean ± 95% CI across episodes. Expert practitioners evaluate generalization to task distributions outside meta-training — if a model only works on tasks similar to what it meta-trained on, it hasn't truly learned to learn.

== Creating ==
Designing a meta-learning system for rapid domain adaptation: (1) Collect a large collection of diverse tasks from the target distribution (if supervised) or define task samplers. (2) For metric-based: train a shared encoder; at deployment, embed support set and classify by nearest prototype. (3) For MAML: use FOMAML or Reptile (simpler first-order approximation) for practical training. (4) For LLM-based: invest in prompt design and few-shot example selection — example quality matters more than quantity. (5) Continually collect new tasks at deployment and add them to the meta-training distribution to prevent drift.

[[Category:Artificial Intelligence]]
[[Category:Machine Learning]]
[[Category:Meta-Learning]]