Editing
Meta Learning
Jump to navigation
Jump to search
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
<div style="background-color: #4B0082; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;"> {{BloomIntro}} Meta-learning, also called "learning to learn," is a subfield of machine learning focused on designing models and algorithms that improve their learning ability with experience. While standard machine learning trains a model to perform a specific task, meta-learning trains a model to learn new tasks quickly β often from very few examples. The goal is not to solve a particular problem, but to become a more efficient learner across a distribution of problems. Meta-learning underpins few-shot learning, rapid adaptation in robotics, and is increasingly applied to hyperparameter optimization and neural architecture search. </div> __TOC__ <div style="background-color: #000080; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;"> == <span style="color: #FFFFFF;">Remembering</span> == * '''Meta-learning''' β Learning to learn; training models to adapt quickly to new tasks using experience across many tasks. * '''Meta-learner''' β The higher-level model that learns across tasks and produces or adapts base learners. * '''Base learner''' β The model that is applied to individual tasks; updated by the meta-learner. * '''Support set''' β The small labeled dataset provided at test time to adapt to a new task (analogous to training data for the base learner). * '''Query set''' β The test examples for the new task on which performance is evaluated after adaptation. * '''N-way K-shot learning''' β A meta-learning task setting: N classes, K labeled examples per class in the support set. * '''Episode''' β One meta-learning training iteration, consisting of a sampled task with its support and query sets. * '''MAML (Model-Agnostic Meta-Learning)''' β A gradient-based meta-learning algorithm that finds initialization parameters enabling rapid fine-tuning on new tasks. * '''Prototypical Networks''' β A metric-based meta-learning approach that classifies by distance to class prototype embeddings. * '''Matching Networks''' β A metric-based approach using an attention mechanism over support set embeddings. * '''Meta-SGD''' β An extension of MAML that also meta-learns per-parameter learning rates. * '''In-context learning''' β The emergent ability of large LLMs to learn new tasks from examples provided in the prompt, without gradient updates. * '''Hyperparameter optimization (HPO)''' β Automatically finding optimal hyperparameters; meta-learning approaches (BOHB, SMAC) use experience across runs. </div> <div style="background-color: #006400; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;"> == <span style="color: #FFFFFF;">Understanding</span> == Standard training gives a model a fixed behavior. Meta-learning gives a model the ability to '''quickly adapt''' its behavior given a few new examples. '''The meta-learning objective''': Across many tasks T sampled from a distribution p(T), find model parameters ΞΈ that can quickly adapt to any task using only a few examples. Formally: min''ΞΈ E''{T~p(T)}[L''T(f''{ΞΈ'})] where ΞΈ' = Adapt(ΞΈ, support''set''T). '''Three meta-learning approaches''': '''Metric-based''': Learn an embedding space where classification is easy β similar examples are close, different ones are far. At test time, classify by distance to class prototypes (Prototypical Networks) or by weighted attention over support examples (Matching Networks). '''Optimization-based (MAML)''': Find model initialization ΞΈ such that a few gradient steps on the support set produce a good model for the query set. The meta-update optimizes through the adaptation process β it literally backpropagates through gradient descent steps. '''Model-based''': Use a recurrent or attention architecture that quickly updates its "memory" when shown support examples. The model's hidden state encodes the task context, enabling immediate adaptation. '''In-context learning''' (emergent in LLMs) is meta-learning without gradient updates: GPT-4 can learn to translate into a new language, write in a new style, or follow new formatting rules from just a few examples in the prompt. The model's weights don't change β it "adapts" purely through the attention mechanism reading the context. </div> <div style="background-color: #8B0000; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;"> == <span style="color: #FFFFFF;">Applying</span> == '''MAML implementation for few-shot classification:''' <syntaxhighlight lang="python"> import torch import torch.nn as nn from copy import deepcopy class MAML: def __init__(self, model, inner_lr=0.01, outer_lr=0.001, n_inner_steps=5): self.model = model self.inner_lr = inner_lr self.optimizer = torch.optim.Adam(model.parameters(), lr=outer_lr) self.n_inner_steps = n_inner_steps self.loss_fn = nn.CrossEntropyLoss() def inner_adapt(self, support_x, support_y): """Fast adaptation on support set (simulated fine-tuning).""" fast_model = deepcopy(self.model) fast_optimizer = torch.optim.SGD(fast_model.parameters(), lr=self.inner_lr) for _ in range(self.n_inner_steps): fast_optimizer.zero_grad() self.loss_fn(fast_model(support_x), support_y).backward() fast_optimizer.step() return fast_model def meta_update(self, episodes): """Outer loop: update ΞΈ to minimize query loss after adaptation.""" meta_loss = 0.0 for support_x, support_y, query_x, query_y in episodes: adapted = self.inner_adapt(support_x, support_y) meta_loss += self.loss_fn(adapted(query_x), query_y) self.optimizer.zero_grad() (meta_loss / len(episodes)).backward() self.optimizer.step() </syntaxhighlight> ; Meta-learning approach selection : '''Image classification, few-shot''' β Prototypical Networks (simple, effective) : '''Any architecture, gradient-based''' β MAML, FOMAML (first-order approximation) : '''NLP, few-shot''' β In-context learning with a large LLM : '''HPO automation''' β BOHB (Bayesian Optimization + Hyperband), Optuna : '''NAS''' β DARTS (gradient-based), evolutionary search </div> <div style="background-color: #8B4500; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;"> == <span style="color: #FFFFFF;">Analyzing</span> == {| class="wikitable" |+ Meta-Learning Approach Comparison ! Approach !! Requires Gradient Through Adapt !! Speed !! Generalization |- | Metric-based (Prototypical) || No || Very fast || Good (within domain) |- | MAML || Yes (expensive) || Slow (2nd order) || Good |- | FOMAML || No (first-order approx) || Moderate || Good |- | In-context learning || No (inference only) || Fast || Excellent (large LLMs) |- | Model-based (NTM) || No || Fast || Moderate |} '''Failure modes''': MAML's second-order gradients are computationally expensive and numerically unstable. Metric-based methods fail when tasks are too diverse for a shared embedding space. In-context learning degrades when the context window fills or examples are poorly formatted. Meta-overfitting β the meta-learner overfits to the meta-training task distribution, failing on truly novel tasks. </div> <div style="background-color: #483D8B; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;"> == <span style="color: #FFFFFF;">Evaluating</span> == Evaluation on standard benchmarks: '''Omniglot''' (20-way 1-shot, 5-shot character recognition), '''miniImageNet''' (5-way 1-shot, 5-shot), '''Meta-Dataset''' (diverse cross-domain few-shot). Report mean Β± 95% CI across episodes. Expert practitioners evaluate generalization to task distributions outside meta-training β if a model only works on tasks similar to what it meta-trained on, it hasn't truly learned to learn. </div> <div style="background-color: #2F4F4F; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;"> == <span style="color: #FFFFFF;">Creating</span> == Designing a meta-learning system for rapid domain adaptation: # Collect a large collection of diverse tasks from the target distribution (if supervised) or define task samplers. # For metric-based: train a shared encoder; at deployment, embed support set and classify by nearest prototype. # For MAML: use FOMAML or Reptile (simpler first-order approximation) for practical training. # For LLM-based: invest in prompt design and few-shot example selection β example quality matters more than quantity. # Continually collect new tasks at deployment and add them to the meta-training distribution to prevent drift. [[Category:Artificial Intelligence]] [[Category:Machine Learning]] [[Category:Meta-Learning]] </div>
Summary:
Please note that all contributions to BloomWiki may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
BloomWiki:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Template used on this page:
Template:BloomIntro
(
edit
)
Navigation menu
Personal tools
Not logged in
Talk
Contributions
Create account
Log in
Namespaces
Page
Discussion
English
Views
Read
Edit
View history
More
Search
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Tools
What links here
Related changes
Special pages
Page information