Semi-Supervised Learning

From BloomWiki
Revision as of 01:57, 25 April 2026 by Wordpad (talk | contribs) (BloomWiki: Semi-Supervised Learning)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

How to read this page: This article maps the topic from beginner to expert across six levels � Remembering, Understanding, Applying, Analyzing, Evaluating, and Creating. Scan the headings to see the full scope, then read from wherever your knowledge starts to feel uncertain. Learn more about how BloomWiki works ?

Semi-supervised learning sits between supervised learning (which requires labels for all training data) and unsupervised learning (which uses no labels). It leverages a small amount of labeled data alongside a large amount of unlabeled data to train better models than either approach alone. Since labeled data is expensive and time-consuming to acquire while unlabeled data is often abundantly available, semi-supervised learning is highly practical. Modern variants include pseudo-labeling, consistency regularization, and graph-based methods.

Remembering

  • Semi-supervised learning — Learning using a small labeled dataset and a large unlabeled dataset simultaneously.
  • Pseudo-labeling — Using a model's predictions on unlabeled data as provisional labels, then retraining on those labels.
  • Consistency regularization — Enforcing that model predictions remain consistent under perturbations of unlabeled inputs.
  • Mean Teacher — A semi-supervised method where a student model is trained, and the teacher model is an exponential moving average of student weights; teacher provides pseudo-labels.
  • FixMatch — A state-of-the-art semi-supervised image classification method using confidence thresholding and weak/strong augmentation consistency.
  • MixMatch — A holistic semi-supervised approach combining pseudo-labeling, consistency regularization, and MixUp data augmentation.
  • Self-training — Train on labeled data, predict labels for unlabeled data, retrain on the combination; repeat iteratively.
  • Co-training — Train two models on different feature views; each provides pseudo-labels for the other.
  • Graph-based methods — Propagate labels through a graph where edges represent similarity between examples (label propagation).
  • Label propagation — Semi-supervised algorithm that spreads labels from labeled to unlabeled examples through a similarity graph.
  • Manifold assumption — The assumption that data lies on a low-dimensional manifold; points on the same manifold should have the same label.
  • Smoothness assumption — If two points are close in input space, they should have similar labels.
  • Cluster assumption — Decision boundaries should lie in low-density regions between clusters.
  • Confidence threshold — In pseudo-labeling, only use predictions where model confidence exceeds a threshold; avoids noisy pseudo-labels.

Understanding

Semi-supervised learning works by exploiting the **structure of the unlabeled data distribution** to constrain the label function. The key assumptions:

    • Smoothness**: Nearby points → similar labels. If two images of dogs are close in feature space, they should both be labeled "dog."
    • Cluster**: Classes form clusters. The decision boundary should pass through low-density regions between clusters, not through high-density regions.
    • Manifold**: Data lies on lower-dimensional manifolds. Using unlabeled data to learn the manifold structure helps place decision boundaries correctly.
    • Self-training process**: (1) Train on labeled data. (2) Predict labels for unlabeled data. (3) Add high-confidence predictions to training set. (4) Retrain. (5) Repeat. Risk: confident errors propagate (confirmation bias). Mitigated by strict confidence thresholds.
    • FixMatch**: The state-of-the-art simple baseline. For each unlabeled image: (1) Apply weak augmentation (horizontal flip, crop). (2) If prediction confidence > 0.95, use as pseudo-label. (3) Apply strong augmentation (RandAugment). (4) Train student to predict the pseudo-label on the strongly augmented view. This enforces consistency across augmentation strengths while only training on confident pseudo-labels.
    • When does semi-supervised help most?** When labeled data is very scarce (<1000 examples) and unlabeled data shares the same distribution as labeled data. When distributions differ (domain shift between labeled and unlabeled), semi-supervised can hurt — a form of negative transfer.

Applying

FixMatch implementation: <syntaxhighlight lang="python"> import torch import torch.nn.functional as F

def fixmatch_loss(model, labeled_x, labels, unlabeled_x_weak, unlabeled_x_strong,

                 threshold=0.95, lambda_u=1.0):
   # Supervised loss on labeled data
   logits_labeled = model(labeled_x)
   loss_supervised = F.cross_entropy(logits_labeled, labels)
   # Pseudo-label on weakly augmented unlabeled data
   with torch.no_grad():
       logits_weak = model(unlabeled_x_weak)
       probs_weak = F.softmax(logits_weak, dim=-1)
       max_probs, pseudo_labels = probs_weak.max(dim=-1)
       # Mask: only use predictions above confidence threshold
       mask = (max_probs >= threshold).float()
   # Consistency loss: predict pseudo-label on strongly augmented version
   logits_strong = model(unlabeled_x_strong)
   loss_unsupervised = (F.cross_entropy(logits_strong, pseudo_labels, reduction='none') * mask).mean()
   return loss_supervised + lambda_u * loss_unsupervised

</syntaxhighlight>

Semi-supervised method selection
Image classification → FixMatch, FlexMatch, FreeMatch (confidence threshold scheduling)
NLP → UDA (Unsupervised Data Augmentation), pre-train then fine-tune (BERT approach)
Graph data → Label propagation, Graph Convolutional Networks (GCN)
Small labeled set (<100 samples) → Mean Teacher, MixMatch
Production setting → Self-training with pseudo-labels (simple, scalable)

Analyzing

Semi-Supervised Methods Comparison
Method Labeled Data Needed Key Idea Best Domain
Self-training ~10-20% Confidence filtering Any
FixMatch <1% Consistency + threshold Vision
Mean Teacher <5% EMA teacher labels Vision
Label Propagation ~5% Graph diffusion Low-dim, graph
BERT fine-tuning <1% (semi) Large pre-training NLP

Failure modes: Pseudo-label noise — incorrect confident predictions pollute training. Distribution mismatch — unlabeled data from different distribution hurts performance. Over-fitting on pseudo-labels — model memorizes spurious patterns in pseudo-labels. Confirmation bias — model fails to correct its own early confident errors.

Evaluating

Evaluation must match the practical setting: hold out a labeled test set; train only on the (small labeled) + (large unlabeled) split. Report accuracy as a function of labeled data fraction (1%, 5%, 10%) to show the semi-supervised benefit curve. Compare against: (1) supervised-only (small labeled set), (2) fully supervised (all labeled), and (3) self-supervised pre-training + fine-tuning as competing baselines.

Creating

Designing a semi-supervised pipeline: (1) Start with self-training — simple, effective, easy to implement. (2) Set high confidence threshold (0.95+) to avoid noisy pseudo-labels. (3) Apply curriculum: increase unlabeled data usage as model improves (FlexMatch adaptive threshold). (4) For vision: use FixMatch with RandAugment strong augmentation. (5) For NLP: leverage domain-adaptive pre-training on unlabeled data, then fine-tune on labels. (6) Monitor pseudo-label quality: compute accuracy of pseudo-labels on held-out labeled data as a proxy for noise level.