Semi Supervised
How to read this page: This article maps the topic from beginner to expert across six levels � Remembering, Understanding, Applying, Analyzing, Evaluating, and Creating. Scan the headings to see the full scope, then read from wherever your knowledge starts to feel uncertain. Learn more about how BloomWiki works ?
Semi-supervised learning sits between supervised learning (which requires labels for all training data) and unsupervised learning (which uses no labels). It leverages a small amount of labeled data alongside a large amount of unlabeled data to train better models than either approach alone. Since labeled data is expensive and time-consuming to acquire while unlabeled data is often abundantly available, semi-supervised learning is highly practical. Modern variants include pseudo-labeling, consistency regularization, and graph-based methods.
Remembering[edit]
- Semi-supervised learning — Learning using a small labeled dataset and a large unlabeled dataset simultaneously.
- Pseudo-labeling — Using a model's predictions on unlabeled data as provisional labels, then retraining on those labels.
- Consistency regularization — Enforcing that model predictions remain consistent under perturbations of unlabeled inputs.
- Mean Teacher — A semi-supervised method where a student model is trained, and the teacher model is an exponential moving average of student weights; teacher provides pseudo-labels.
- FixMatch — A state-of-the-art semi-supervised image classification method using confidence thresholding and weak/strong augmentation consistency.
- MixMatch — A holistic semi-supervised approach combining pseudo-labeling, consistency regularization, and MixUp data augmentation.
- Self-training — Train on labeled data, predict labels for unlabeled data, retrain on the combination; repeat iteratively.
- Co-training — Train two models on different feature views; each provides pseudo-labels for the other.
- Graph-based methods — Propagate labels through a graph where edges represent similarity between examples (label propagation).
- Label propagation — Semi-supervised algorithm that spreads labels from labeled to unlabeled examples through a similarity graph.
- Manifold assumption — The assumption that data lies on a low-dimensional manifold; points on the same manifold should have the same label.
- Smoothness assumption — If two points are close in input space, they should have similar labels.
- Cluster assumption — Decision boundaries should lie in low-density regions between clusters.
- Confidence threshold — In pseudo-labeling, only use predictions where model confidence exceeds a threshold; avoids noisy pseudo-labels.
Understanding[edit]
Semi-supervised learning works by exploiting the structure of the unlabeled data distribution to constrain the label function. The key assumptions:
Smoothness: Nearby points → similar labels. If two images of dogs are close in feature space, they should both be labeled "dog."
Cluster: Classes form clusters. The decision boundary should pass through low-density regions between clusters, not through high-density regions.
Manifold: Data lies on lower-dimensional manifolds. Using unlabeled data to learn the manifold structure helps place decision boundaries correctly.
Self-training process:
- Train on labeled data.
- Predict labels for unlabeled data.
- Add high-confidence predictions to training set.
- Retrain.
- Repeat. Risk: confident errors propagate (confirmation bias). Mitigated by strict confidence thresholds.
FixMatch: The state-of-the-art simple baseline. For each unlabeled image:
- Apply weak augmentation (horizontal flip, crop).
- If prediction confidence > 0.95, use as pseudo-label.
- Apply strong augmentation (RandAugment).
- Train student to predict the pseudo-label on the strongly augmented view. This enforces consistency across augmentation strengths while only training on confident pseudo-labels.
When does semi-supervised help most? When labeled data is very scarce (<1000 examples) and unlabeled data shares the same distribution as labeled data. When distributions differ (domain shift between labeled and unlabeled), semi-supervised can hurt — a form of negative transfer.
Applying[edit]
FixMatch implementation: <syntaxhighlight lang="python"> import torch import torch.nn.functional as F
def fixmatch_loss(model, labeled_x, labels, unlabeled_x_weak, unlabeled_x_strong,
threshold=0.95, lambda_u=1.0): # Supervised loss on labeled data logits_labeled = model(labeled_x) loss_supervised = F.cross_entropy(logits_labeled, labels)
# Pseudo-label on weakly augmented unlabeled data
with torch.no_grad():
logits_weak = model(unlabeled_x_weak)
probs_weak = F.softmax(logits_weak, dim=-1)
max_probs, pseudo_labels = probs_weak.max(dim=-1)
# Mask: only use predictions above confidence threshold
mask = (max_probs >= threshold).float()
# Consistency loss: predict pseudo-label on strongly augmented version logits_strong = model(unlabeled_x_strong) loss_unsupervised = (F.cross_entropy(logits_strong, pseudo_labels, reduction='none') * mask).mean()
return loss_supervised + lambda_u * loss_unsupervised
</syntaxhighlight>
- Semi-supervised method selection
- Image classification → FixMatch, FlexMatch, FreeMatch (confidence threshold scheduling)
- NLP → UDA (Unsupervised Data Augmentation), pre-train then fine-tune (BERT approach)
- Graph data → Label propagation, Graph Convolutional Networks (GCN)
- Small labeled set (<100 samples) → Mean Teacher, MixMatch
- Production setting → Self-training with pseudo-labels (simple, scalable)
Analyzing[edit]
| Method | Labeled Data Needed | Key Idea | Best Domain |
|---|---|---|---|
| Self-training | ~10-20% | Confidence filtering | Any |
| FixMatch | <1% | Consistency + threshold | Vision |
| Mean Teacher | <5% | EMA teacher labels | Vision |
| Label Propagation | ~5% | Graph diffusion | Low-dim, graph |
| BERT fine-tuning | <1% (semi) | Large pre-training | NLP |
Failure modes: Pseudo-label noise — incorrect confident predictions pollute training. Distribution mismatch — unlabeled data from different distribution hurts performance. Over-fitting on pseudo-labels — model memorizes spurious patterns in pseudo-labels. Confirmation bias — model fails to correct its own early confident errors.
Evaluating[edit]
Evaluation must match the practical setting: hold out a labeled test set; train only on the (small labeled) + (large unlabeled) split. Report accuracy as a function of labeled data fraction (1%, 5%, 10%) to show the semi-supervised benefit curve. Compare against:
- supervised-only (small labeled set),
- fully supervised (all labeled), and
- self-supervised pre-training + fine-tuning as competing baselines.
Creating[edit]
Designing a semi-supervised pipeline:
- Start with self-training — simple, effective, easy to implement.
- Set high confidence threshold (0.95+) to avoid noisy pseudo-labels.
- Apply curriculum: increase unlabeled data usage as model improves (FlexMatch adaptive threshold).
- For vision: use FixMatch with RandAugment strong augmentation.
- For NLP: leverage domain-adaptive pre-training on unlabeled data, then fine-tune on labels.
- Monitor pseudo-label quality: compute accuracy of pseudo-labels on held-out labeled data as a proxy for noise level.