Semi Supervised

How to read this page: This article maps the topic from beginner to expert across six levels � Remembering, Understanding, Applying, Analyzing, Evaluating, and Creating. Scan the headings to see the full scope, then read from wherever your knowledge starts to feel uncertain. Learn more about how BloomWiki works ?

Semi-supervised learning sits between supervised learning (which requires labels for all training data) and unsupervised learning (which uses no labels). It leverages a small amount of labeled data alongside a large amount of unlabeled data to train better models than either approach alone. Since labeled data is expensive and time-consuming to acquire while unlabeled data is often abundantly available, semi-supervised learning is highly practical. Modern variants include pseudo-labeling, consistency regularization, and graph-based methods.

Remembering[edit]

Semi-supervised learning — Learning using a small labeled dataset and a large unlabeled dataset simultaneously.
Pseudo-labeling — Using a model's predictions on unlabeled data as provisional labels, then retraining on those labels.
Consistency regularization — Enforcing that model predictions remain consistent under perturbations of unlabeled inputs.
Mean Teacher — A semi-supervised method where a student model is trained, and the teacher model is an exponential moving average of student weights; teacher provides pseudo-labels.
FixMatch — A state-of-the-art semi-supervised image classification method using confidence thresholding and weak/strong augmentation consistency.
MixMatch — A holistic semi-supervised approach combining pseudo-labeling, consistency regularization, and MixUp data augmentation.
Self-training — Train on labeled data, predict labels for unlabeled data, retrain on the combination; repeat iteratively.
Co-training — Train two models on different feature views; each provides pseudo-labels for the other.
Graph-based methods — Propagate labels through a graph where edges represent similarity between examples (label propagation).
Label propagation — Semi-supervised algorithm that spreads labels from labeled to unlabeled examples through a similarity graph.
Manifold assumption — The assumption that data lies on a low-dimensional manifold; points on the same manifold should have the same label.
Smoothness assumption — If two points are close in input space, they should have similar labels.
Cluster assumption — Decision boundaries should lie in low-density regions between clusters.
Confidence threshold — In pseudo-labeling, only use predictions where model confidence exceeds a threshold; avoids noisy pseudo-labels.

Understanding[edit]

Semi-supervised learning works by exploiting the structure of the unlabeled data distribution to constrain the label function. The key assumptions:

Smoothness: Nearby points → similar labels. If two images of dogs are close in feature space, they should both be labeled "dog."

Cluster: Classes form clusters. The decision boundary should pass through low-density regions between clusters, not through high-density regions.

Manifold: Data lies on lower-dimensional manifolds. Using unlabeled data to learn the manifold structure helps place decision boundaries correctly.

Self-training process:

Train on labeled data.
Predict labels for unlabeled data.
Add high-confidence predictions to training set.
Retrain.
Repeat. Risk: confident errors propagate (confirmation bias). Mitigated by strict confidence thresholds.

FixMatch: The state-of-the-art simple baseline. For each unlabeled image:

Apply weak augmentation (horizontal flip, crop).
If prediction confidence > 0.95, use as pseudo-label.
Apply strong augmentation (RandAugment).
Train student to predict the pseudo-label on the strongly augmented view. This enforces consistency across augmentation strengths while only training on confident pseudo-labels.

When does semi-supervised help most? When labeled data is very scarce (<1000 examples) and unlabeled data shares the same distribution as labeled data. When distributions differ (domain shift between labeled and unlabeled), semi-supervised can hurt — a form of negative transfer.

Applying[edit]

FixMatch implementation: <syntaxhighlight lang="python"> import torch import torch.nn.functional as F

def fixmatch_loss(model, labeled_x, labels, unlabeled_x_weak, unlabeled_x_strong,

                 threshold=0.95, lambda_u=1.0):
   # Supervised loss on labeled data
   logits_labeled = model(labeled_x)
   loss_supervised = F.cross_entropy(logits_labeled, labels)

   # Pseudo-label on weakly augmented unlabeled data
   with torch.no_grad():
       logits_weak = model(unlabeled_x_weak)
       probs_weak = F.softmax(logits_weak, dim=-1)
       max_probs, pseudo_labels = probs_weak.max(dim=-1)
       # Mask: only use predictions above confidence threshold
       mask = (max_probs >= threshold).float()

   # Consistency loss: predict pseudo-label on strongly augmented version
   logits_strong = model(unlabeled_x_strong)
   loss_unsupervised = (F.cross_entropy(logits_strong, pseudo_labels, reduction='none') * mask).mean()

   return loss_supervised + lambda_u * loss_unsupervised

</syntaxhighlight>

Semi-supervised method selection: Image classification → FixMatch, FlexMatch, FreeMatch (confidence threshold scheduling); NLP → UDA (Unsupervised Data Augmentation), pre-train then fine-tune (BERT approach); Graph data → Label propagation, Graph Convolutional Networks (GCN); Small labeled set (<100 samples) → Mean Teacher, MixMatch; Production setting → Self-training with pseudo-labels (simple, scalable)

Analyzing[edit]

Semi-Supervised Methods Comparison
Method	Labeled Data Needed	Key Idea	Best Domain
Self-training	~10-20%	Confidence filtering	Any
FixMatch	<1%	Consistency + threshold	Vision
Mean Teacher	<5%	EMA teacher labels	Vision
Label Propagation	~5%	Graph diffusion	Low-dim, graph
BERT fine-tuning	<1% (semi)	Large pre-training	NLP

Failure modes: Pseudo-label noise — incorrect confident predictions pollute training. Distribution mismatch — unlabeled data from different distribution hurts performance. Over-fitting on pseudo-labels — model memorizes spurious patterns in pseudo-labels. Confirmation bias — model fails to correct its own early confident errors.

Evaluating[edit]

Evaluation must match the practical setting: hold out a labeled test set; train only on the (small labeled) + (large unlabeled) split. Report accuracy as a function of labeled data fraction (1%, 5%, 10%) to show the semi-supervised benefit curve. Compare against:

supervised-only (small labeled set),
fully supervised (all labeled), and
self-supervised pre-training + fine-tuning as competing baselines.

Creating[edit]

Designing a semi-supervised pipeline:

Start with self-training — simple, effective, easy to implement.
Set high confidence threshold (0.95+) to avoid noisy pseudo-labels.
Apply curriculum: increase unlabeled data usage as model improves (FlexMatch adaptive threshold).
For vision: use FixMatch with RandAugment strong augmentation.
For NLP: leverage domain-adaptive pre-training on unlabeled data, then fine-tune on labels.
Monitor pseudo-label quality: compute accuracy of pseudo-labels on held-out labeled data as a proxy for noise level.

Semi Supervised

Contents

Remembering[edit]

Understanding[edit]

Applying[edit]

Analyzing[edit]

Evaluating[edit]

Creating[edit]

Navigation menu

Semi Supervised

Remembering[edit]

Understanding[edit]

Applying[edit]

Analyzing[edit]

Evaluating[edit]

Creating[edit]

Navigation menu

Search