Editing
Semi-Supervised Learning
Jump to navigation
Jump to search
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
<div style="background-color: #4B0082; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;"> {{BloomIntro}} Semi-supervised learning sits between supervised learning (which requires labels for all training data) and unsupervised learning (which uses no labels). It leverages a small amount of labeled data alongside a large amount of unlabeled data to train better models than either approach alone. Since labeled data is expensive and time-consuming to acquire while unlabeled data is often abundantly available, semi-supervised learning is highly practical. Modern variants include pseudo-labeling, consistency regularization, and graph-based methods. </div> __TOC__ <div style="background-color: #000080; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;"> == <span style="color: #FFFFFF;">Remembering</span> == * '''Semi-supervised learning''' β Learning using a small labeled dataset and a large unlabeled dataset simultaneously. * '''Pseudo-labeling''' β Using a model's predictions on unlabeled data as provisional labels, then retraining on those labels. * '''Consistency regularization''' β Enforcing that model predictions remain consistent under perturbations of unlabeled inputs. * '''Mean Teacher''' β A semi-supervised method where a student model is trained, and the teacher model is an exponential moving average of student weights; teacher provides pseudo-labels. * '''FixMatch''' β A state-of-the-art semi-supervised image classification method using confidence thresholding and weak/strong augmentation consistency. * '''MixMatch''' β A holistic semi-supervised approach combining pseudo-labeling, consistency regularization, and MixUp data augmentation. * '''Self-training''' β Train on labeled data, predict labels for unlabeled data, retrain on the combination; repeat iteratively. * '''Co-training''' β Train two models on different feature views; each provides pseudo-labels for the other. * '''Graph-based methods''' β Propagate labels through a graph where edges represent similarity between examples (label propagation). * '''Label propagation''' β Semi-supervised algorithm that spreads labels from labeled to unlabeled examples through a similarity graph. * '''Manifold assumption''' β The assumption that data lies on a low-dimensional manifold; points on the same manifold should have the same label. * '''Smoothness assumption''' β If two points are close in input space, they should have similar labels. * '''Cluster assumption''' β Decision boundaries should lie in low-density regions between clusters. * '''Confidence threshold''' β In pseudo-labeling, only use predictions where model confidence exceeds a threshold; avoids noisy pseudo-labels. </div> <div style="background-color: #006400; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;"> == <span style="color: #FFFFFF;">Understanding</span> == Semi-supervised learning works by exploiting the **structure of the unlabeled data distribution** to constrain the label function. The key assumptions: **Smoothness**: Nearby points β similar labels. If two images of dogs are close in feature space, they should both be labeled "dog." **Cluster**: Classes form clusters. The decision boundary should pass through low-density regions between clusters, not through high-density regions. **Manifold**: Data lies on lower-dimensional manifolds. Using unlabeled data to learn the manifold structure helps place decision boundaries correctly. **Self-training process**: (1) Train on labeled data. (2) Predict labels for unlabeled data. (3) Add high-confidence predictions to training set. (4) Retrain. (5) Repeat. Risk: confident errors propagate (confirmation bias). Mitigated by strict confidence thresholds. **FixMatch**: The state-of-the-art simple baseline. For each unlabeled image: (1) Apply weak augmentation (horizontal flip, crop). (2) If prediction confidence > 0.95, use as pseudo-label. (3) Apply strong augmentation (RandAugment). (4) Train student to predict the pseudo-label on the strongly augmented view. This enforces consistency across augmentation strengths while only training on confident pseudo-labels. **When does semi-supervised help most?** When labeled data is very scarce (<1000 examples) and unlabeled data shares the same distribution as labeled data. When distributions differ (domain shift between labeled and unlabeled), semi-supervised can hurt β a form of negative transfer. </div> <div style="background-color: #8B0000; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;"> == <span style="color: #FFFFFF;">Applying</span> == '''FixMatch implementation:''' <syntaxhighlight lang="python"> import torch import torch.nn.functional as F def fixmatch_loss(model, labeled_x, labels, unlabeled_x_weak, unlabeled_x_strong, threshold=0.95, lambda_u=1.0): # Supervised loss on labeled data logits_labeled = model(labeled_x) loss_supervised = F.cross_entropy(logits_labeled, labels) # Pseudo-label on weakly augmented unlabeled data with torch.no_grad(): logits_weak = model(unlabeled_x_weak) probs_weak = F.softmax(logits_weak, dim=-1) max_probs, pseudo_labels = probs_weak.max(dim=-1) # Mask: only use predictions above confidence threshold mask = (max_probs >= threshold).float() # Consistency loss: predict pseudo-label on strongly augmented version logits_strong = model(unlabeled_x_strong) loss_unsupervised = (F.cross_entropy(logits_strong, pseudo_labels, reduction='none') * mask).mean() return loss_supervised + lambda_u * loss_unsupervised </syntaxhighlight> ; Semi-supervised method selection : '''Image classification''' β FixMatch, FlexMatch, FreeMatch (confidence threshold scheduling) : '''NLP''' β UDA (Unsupervised Data Augmentation), pre-train then fine-tune (BERT approach) : '''Graph data''' β Label propagation, Graph Convolutional Networks (GCN) : '''Small labeled set (<100 samples)''' β Mean Teacher, MixMatch : '''Production setting''' β Self-training with pseudo-labels (simple, scalable) </div> <div style="background-color: #8B4500; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;"> == <span style="color: #FFFFFF;">Analyzing</span> == {| class="wikitable" |+ Semi-Supervised Methods Comparison ! Method !! Labeled Data Needed !! Key Idea !! Best Domain |- | Self-training || ~10-20% || Confidence filtering || Any |- | FixMatch || <1% || Consistency + threshold || Vision |- | Mean Teacher || <5% || EMA teacher labels || Vision |- | Label Propagation || ~5% || Graph diffusion || Low-dim, graph |- | BERT fine-tuning || <1% (semi) || Large pre-training || NLP |} '''Failure modes''': Pseudo-label noise β incorrect confident predictions pollute training. Distribution mismatch β unlabeled data from different distribution hurts performance. Over-fitting on pseudo-labels β model memorizes spurious patterns in pseudo-labels. Confirmation bias β model fails to correct its own early confident errors. </div> <div style="background-color: #483D8B; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;"> == <span style="color: #FFFFFF;">Evaluating</span> == Evaluation must match the practical setting: hold out a labeled test set; train only on the (small labeled) + (large unlabeled) split. Report accuracy as a function of labeled data fraction (1%, 5%, 10%) to show the semi-supervised benefit curve. Compare against: (1) supervised-only (small labeled set), (2) fully supervised (all labeled), and (3) self-supervised pre-training + fine-tuning as competing baselines. </div> <div style="background-color: #2F4F4F; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;"> == <span style="color: #FFFFFF;">Creating</span> == Designing a semi-supervised pipeline: (1) Start with self-training β simple, effective, easy to implement. (2) Set high confidence threshold (0.95+) to avoid noisy pseudo-labels. (3) Apply curriculum: increase unlabeled data usage as model improves (FlexMatch adaptive threshold). (4) For vision: use FixMatch with RandAugment strong augmentation. (5) For NLP: leverage domain-adaptive pre-training on unlabeled data, then fine-tune on labels. (6) Monitor pseudo-label quality: compute accuracy of pseudo-labels on held-out labeled data as a proxy for noise level. [[Category:Artificial Intelligence]] [[Category:Machine Learning]] [[Category:Semi-Supervised Learning]] </div>
Summary:
Please note that all contributions to BloomWiki may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
BloomWiki:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Template used on this page:
Template:BloomIntro
(
edit
)
Navigation menu
Personal tools
Not logged in
Talk
Contributions
Create account
Log in
Namespaces
Page
Discussion
English
Views
Read
Edit
View history
More
Search
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Tools
What links here
Related changes
Special pages
Page information