Few-Shot and Zero-Shot Learning - Revision history

Wordpad: BloomWiki: Few-Shot and Zero-Shot Learning

2026-04-25T01:51:04Z

BloomWiki: Few-Shot and Zero-Shot Learning

← Older revision		Revision as of 01:51, 25 April 2026
Line 1:		Line 1:
			<div style="background-color: #4B0082; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
	{{BloomIntro}}		{{BloomIntro}}
	Few-shot and zero-shot learning address one of the most fundamental challenges in AI: learning from very little data. Standard deep learning requires thousands to millions of labeled examples. Few-shot learning achieves high performance with just 1–10 examples per class. Zero-shot learning requires no task-specific examples at all — the model generalizes entirely from its pre-existing knowledge and the description of the new task. These capabilities are increasingly important as AI is applied to specialized domains where labeled data is scarce.		Few-shot and zero-shot learning address one of the most fundamental challenges in AI: learning from very little data. Standard deep learning requires thousands to millions of labeled examples. Few-shot learning achieves high performance with just 1–10 examples per class. Zero-shot learning requires no task-specific examples at all — the model generalizes entirely from its pre-existing knowledge and the description of the new task. These capabilities are increasingly important as AI is applied to specialized domains where labeled data is scarce.
			</div>

	== Remembering ==		__TOC__

			<div style="background-color: #000080; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
			== <span style="color: #FFFFFF;">Remembering</span> ==
	* '''Few-shot learning''' — Learning to classify or solve tasks with very few labeled examples per class (1–10).		* '''Few-shot learning''' — Learning to classify or solve tasks with very few labeled examples per class (1–10).
	* '''Zero-shot learning''' — Making predictions for classes or tasks never seen during training, using semantic descriptions.		* '''Zero-shot learning''' — Making predictions for classes or tasks never seen during training, using semantic descriptions.
Line 15:		Line 20:
	* '''Imagenet zero-shot''' — CLIP achieves 75%+ accuracy on ImageNet without seeing a single ImageNet training example.		* '''Imagenet zero-shot''' — CLIP achieves 75%+ accuracy on ImageNet without seeing a single ImageNet training example.
	* '''Prompt-based few-shot''' — Providing 1–10 examples in the LLM prompt to demonstrate the desired task format.		* '''Prompt-based few-shot''' — Providing 1–10 examples in the LLM prompt to demonstrate the desired task format.
			</div>

	== Understanding ==		<div style="background-color: #006400; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
			== <span style="color: #FFFFFF;">Understanding</span> ==
	Zero-shot learning with CLIP: Train a model to align image and text representations. At inference, compute the image embedding and compare it against text embeddings of all possible class descriptions ("a photo of a cat", "a photo of a dog"). The class with the highest similarity is the prediction — without ever training on these specific classes.		Zero-shot learning with CLIP: Train a model to align image and text representations. At inference, compute the image embedding and compare it against text embeddings of all possible class descriptions ("a photo of a cat", "a photo of a dog"). The class with the highest similarity is the prediction — without ever training on these specific classes.

Line 26:		Line 33:

	Retrieval-augmented zero-shot: When semantic class descriptions aren't available, retrieve relevant documents at inference time and use them to ground predictions — extending the model's effective knowledge without fine-tuning.		Retrieval-augmented zero-shot: When semantic class descriptions aren't available, retrieve relevant documents at inference time and use them to ground predictions — extending the model's effective knowledge without fine-tuning.
			</div>

	== Applying ==		<div style="background-color: #8B0000; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
			== <span style="color: #FFFFFF;">Applying</span> ==
	'''CLIP zero-shot classification:'''		'''CLIP zero-shot classification:'''
	<syntaxhighlight lang="python">		<syntaxhighlight lang="python">
Line 82:		Line 91:
	: '''Vision, few-shot''' → Fine-tune CLIP or DINO with support set		: '''Vision, few-shot''' → Fine-tune CLIP or DINO with support set
	: '''Structured few-shot''' → Prototypical Networks for consistent task structure		: '''Structured few-shot''' → Prototypical Networks for consistent task structure
			</div>

	== Analyzing ==		<div style="background-color: #8B4500; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
			== <span style="color: #FFFFFF;">Analyzing</span> ==
	{\| class="wikitable"		{\| class="wikitable"
	\|+ Zero-Shot vs. Few-Shot vs. Full Supervision		\|+ Zero-Shot vs. Few-Shot vs. Full Supervision
Line 98:		Line 109:

	'''Failure modes''': Zero-shot accuracy drops dramatically for specialized/technical domains not well represented in pre-training data. Class name ambiguity — "bank" (financial institution vs. river bank) causes misclassification without context. In-context learning is sensitive to example order and formatting. Generalized zero-shot learning typically suffers from the "hubness problem" — test embeddings cluster near a few seen classes.		'''Failure modes''': Zero-shot accuracy drops dramatically for specialized/technical domains not well represented in pre-training data. Class name ambiguity — "bank" (financial institution vs. river bank) causes misclassification without context. In-context learning is sensitive to example order and formatting. Generalized zero-shot learning typically suffers from the "hubness problem" — test embeddings cluster near a few seen classes.
			</div>

	== Evaluating ==		<div style="background-color: #483D8B; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
			== <span style="color: #FFFFFF;">Evaluating</span> ==
	Evaluation on standard benchmarks: miniImageNet and tieredImageNet for few-shot vision; FLAN and SuperGLUE for few-shot NLP; VTAB for transfer learning. Always evaluate on truly unseen classes (no leakage). For CLIP zero-shot: compare on ImageNet-V2, ObjectNet (distribution shift variants). For LLM few-shot: measure across diverse k values (0, 1, 4, 8 shots) to characterize the few-shot learning curve.		Evaluation on standard benchmarks: miniImageNet and tieredImageNet for few-shot vision; FLAN and SuperGLUE for few-shot NLP; VTAB for transfer learning. Always evaluate on truly unseen classes (no leakage). For CLIP zero-shot: compare on ImageNet-V2, ObjectNet (distribution shift variants). For LLM few-shot: measure across diverse k values (0, 1, 4, 8 shots) to characterize the few-shot learning curve.
			</div>

	== Creating ==		<div style="background-color: #2F4F4F; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
			== <span style="color: #FFFFFF;">Creating</span> ==
	Designing a few-shot deployment pipeline: (1) Start with zero-shot: use CLIP or GPT-4 with class descriptions — no data collection needed. (2) If accuracy insufficient, collect 5-10 examples per class with domain experts. (3) Use Prototypical Networks or CLIP linear probe on support set embeddings. (4) If still insufficient, collect 100+ examples per class for standard fine-tuning. (5) Monitor class-level performance: some classes may be harder for zero-shot than others — target annotation effort at weak classes. (6) Continuous: as more labeled data accumulates, transition from few-shot to supervised models where it's cost-effective.		Designing a few-shot deployment pipeline: (1) Start with zero-shot: use CLIP or GPT-4 with class descriptions — no data collection needed. (2) If accuracy insufficient, collect 5-10 examples per class with domain experts. (3) Use Prototypical Networks or CLIP linear probe on support set embeddings. (4) If still insufficient, collect 100+ examples per class for standard fine-tuning. (5) Monitor class-level performance: some classes may be harder for zero-shot than others — target annotation effort at weak classes. (6) Continuous: as more labeled data accumulates, transition from few-shot to supervised models where it's cost-effective.

Line 108:		Line 123:
	[[Category:Machine Learning]]		[[Category:Machine Learning]]
	[[Category:Few-Shot Learning]]		[[Category:Few-Shot Learning]]
			</div>

Wordpad: New BloomWiki article: Few-Shot and Zero-Shot Learning

2026-04-23T08:12:35Z

New BloomWiki article: Few-Shot and Zero-Shot Learning

New page

{{BloomIntro}}
Few-shot and zero-shot learning address one of the most fundamental challenges in AI: learning from very little data. Standard deep learning requires thousands to millions of labeled examples. Few-shot learning achieves high performance with just 1–10 examples per class. Zero-shot learning requires no task-specific examples at all — the model generalizes entirely from its pre-existing knowledge and the description of the new task. These capabilities are increasingly important as AI is applied to specialized domains where labeled data is scarce.

== Remembering ==
* '''Few-shot learning''' — Learning to classify or solve tasks with very few labeled examples per class (1–10).
* '''Zero-shot learning''' — Making predictions for classes or tasks never seen during training, using semantic descriptions.
* '''N-way K-shot''' — A standard few-shot task specification: N classes, K labeled examples per class in the support set.
* '''Zero-shot classification''' — Classifying inputs into categories not seen during training, using class descriptions or embeddings.
* '''CLIP (Contrastive Language-Image Pre-training)''' — OpenAI model that enables zero-shot image classification by comparing image embeddings to text class descriptions.
* '''In-context learning''' — LLMs performing few-shot tasks from examples in the context window, without weight updates.
* '''Semantic embeddings''' — Dense vector representations encoding semantic meaning, enabling zero-shot similarity comparisons.
* '''Class prototype''' — The average embedding of all support set examples for a class; used in Prototypical Networks for few-shot classification.
* '''Attribute-based zero-shot''' — Zero-shot learning using human-defined semantic attributes to describe each class.
* '''Generalized zero-shot learning''' — Testing on both seen and unseen classes simultaneously; harder than standard zero-shot.
* '''Imagenet zero-shot''' — CLIP achieves 75%+ accuracy on ImageNet without seeing a single ImageNet training example.
* '''Prompt-based few-shot''' — Providing 1–10 examples in the LLM prompt to demonstrate the desired task format.

== Understanding ==
**Zero-shot learning** with CLIP: Train a model to align image and text representations. At inference, compute the image embedding and compare it against text embeddings of all possible class descriptions ("a photo of a cat", "a photo of a dog"). The class with the highest similarity is the prediction — without ever training on these specific classes.

**Why does zero-shot work?** CLIP was trained on 400M image-text pairs. Through this training it has learned that images of dogs and text about dogs inhabit similar regions of embedding space. At zero-shot time, new class descriptions ("a photo of a Tibetan Mastiff") can be correctly associated with unseen images because the semantic alignment was learned during pre-training.

**In-context few-shot learning**: GPT-4 can learn to perform a new task from 3-5 examples in the prompt — no gradient updates. The model recognizes the pattern in the examples and continues it for new inputs. This is surprisingly powerful for classification, translation, format conversion, and reasoning tasks.

**The few-shot learning / meta-learning connection**: Few-shot learning and meta-learning address the same problem from different angles. Meta-learning trains a model explicitly to learn from few examples (gradient-based: MAML; metric-based: Prototypical Networks). LLM in-context learning achieves similar results without explicit meta-training — an emergent capability.

**Retrieval-augmented zero-shot**: When semantic class descriptions aren't available, retrieve relevant documents at inference time and use them to ground predictions — extending the model's effective knowledge without fine-tuning.

== Applying ==
'''CLIP zero-shot classification:'''
<syntaxhighlight lang="python">
import torch
import clip
from PIL import Image

model, preprocess = clip.load("ViT-B/32", device="cuda")

# Zero-shot classification without any task-specific training
def zero_shot_classify(image_path: str, class_names: list) -> dict:
image = preprocess(Image.open(image_path)).unsqueeze(0).to("cuda")
# Create text descriptions for each class
texts = clip.tokenize([f"a photo of a {cls}" for cls in class_names]).to("cuda")
with torch.no_grad():
image_features = model.encode_image(image)
text_features = model.encode_text(texts)
# Normalize and compute cosine similarity
image_features /= image_features.norm(dim=-1, keepdim=True)
text_features /= text_features.norm(dim=-1, keepdim=True)
similarity = (100.0 * image_features @ text_features.T).softmax(dim=-1)
return {cls: float(sim) for cls, sim in zip(class_names, similarity[0])}

# Works for ANY class names — zero training examples needed!
results = zero_shot_classify("wildlife_photo.jpg",
["lion", "elephant", "giraffe", "zebra", "cheetah", "rhinoceros"])
print(sorted(results.items(), key=lambda x: -x[1]))
</syntaxhighlight>

'''Few-shot classification with Prototypical Networks:'''
<syntaxhighlight lang="python">
import torch
import torch.nn.functional as F

def prototypical_classify(support_embeddings, support_labels, query_embeddings, n_classes):
"""
support_embeddings: (n_classes * k_shot, D) support set embeddings
query_embeddings: (n_query, D) query embeddings
Returns: predicted class for each query
"""
# Compute class prototypes (mean of support embeddings per class)
prototypes = torch.stack([
support_embeddings[support_labels == c].mean(0) for c in range(n_classes)
]) # (n_classes, D)
# Classify queries by nearest prototype
dists = torch.cdist(query_embeddings, prototypes) # (n_query, n_classes)
return dists.argmin(dim=1)
</syntaxhighlight>

; Few-shot / zero-shot approach selection
: '''Vision, zero-shot''' → CLIP (ViT-L/14 for best quality)
: '''NLP, zero-shot''' → LLM with task description in system prompt
: '''NLP, few-shot''' → LLM with 3-10 examples in context
: '''Vision, few-shot''' → Fine-tune CLIP or DINO with support set
: '''Structured few-shot''' → Prototypical Networks for consistent task structure

== Analyzing ==
{| class="wikitable"
|+ Zero-Shot vs. Few-Shot vs. Full Supervision
! Approach !! Data Needed !! Accuracy !! Flexibility !! Deployment Cost
|-
| Zero-shot (CLIP/LLM) || 0 || Medium || Very high || Low (API)
|-
| Few-shot in-context || 1–10 examples || Medium-high || Very high || Low (API)
|-
| Few-shot fine-tuning || ~100 || High || Medium || Medium
|-
| Full supervision || 1000–100k || Highest || Low (task-specific) || High
|}

'''Failure modes''': Zero-shot accuracy drops dramatically for specialized/technical domains not well represented in pre-training data. Class name ambiguity — "bank" (financial institution vs. river bank) causes misclassification without context. In-context learning is sensitive to example order and formatting. Generalized zero-shot learning typically suffers from the "hubness problem" — test embeddings cluster near a few seen classes.

== Evaluating ==
Evaluation on standard benchmarks: **miniImageNet** and **tieredImageNet** for few-shot vision; **FLAN** and **SuperGLUE** for few-shot NLP; **VTAB** for transfer learning. Always evaluate on truly unseen classes (no leakage). For CLIP zero-shot: compare on ImageNet-V2, ObjectNet (distribution shift variants). For LLM few-shot: measure across diverse k values (0, 1, 4, 8 shots) to characterize the few-shot learning curve.

== Creating ==
Designing a few-shot deployment pipeline: (1) Start with zero-shot: use CLIP or GPT-4 with class descriptions — no data collection needed. (2) If accuracy insufficient, collect 5-10 examples per class with domain experts. (3) Use Prototypical Networks or CLIP linear probe on support set embeddings. (4) If still insufficient, collect 100+ examples per class for standard fine-tuning. (5) Monitor class-level performance: some classes may be harder for zero-shot than others — target annotation effort at weak classes. (6) Continuous: as more labeled data accumulates, transition from few-shot to supervised models where it's cost-effective.

[[Category:Artificial Intelligence]]
[[Category:Machine Learning]]
[[Category:Few-Shot Learning]]