Editing Semi-Supervised Learning

<div style="background-color: #4B0082; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
{{BloomIntro}}
Semi-supervised learning sits between supervised learning (which requires labels for all training data) and unsupervised learning (which uses no labels). It leverages a small amount of labeled data alongside a large amount of unlabeled data to train better models than either approach alone. Since labeled data is expensive and time-consuming to acquire while unlabeled data is often abundantly available, semi-supervised learning is highly practical. Modern variants include pseudo-labeling, consistency regularization, and graph-based methods.
</div>

__TOC__

<div style="background-color: #000080; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
== <span style="color: #FFFFFF;">Remembering</span> ==
* '''Semi-supervised learning''' — Learning using a small labeled dataset and a large unlabeled dataset simultaneously.
* '''Pseudo-labeling''' — Using a model's predictions on unlabeled data as provisional labels, then retraining on those labels.
* '''Consistency regularization''' — Enforcing that model predictions remain consistent under perturbations of unlabeled inputs.
* '''Mean Teacher''' — A semi-supervised method where a student model is trained, and the teacher model is an exponential moving average of student weights; teacher provides pseudo-labels.
* '''FixMatch''' — A state-of-the-art semi-supervised image classification method using confidence thresholding and weak/strong augmentation consistency.
* '''MixMatch''' — A holistic semi-supervised approach combining pseudo-labeling, consistency regularization, and MixUp data augmentation.
* '''Self-training''' — Train on labeled data, predict labels for unlabeled data, retrain on the combination; repeat iteratively.
* '''Co-training''' — Train two models on different feature views; each provides pseudo-labels for the other.
* '''Graph-based methods''' — Propagate labels through a graph where edges represent similarity between examples (label propagation).
* '''Label propagation''' — Semi-supervised algorithm that spreads labels from labeled to unlabeled examples through a similarity graph.
* '''Manifold assumption''' — The assumption that data lies on a low-dimensional manifold; points on the same manifold should have the same label.
* '''Smoothness assumption''' — If two points are close in input space, they should have similar labels.
* '''Cluster assumption''' — Decision boundaries should lie in low-density regions between clusters.
* '''Confidence threshold''' — In pseudo-labeling, only use predictions where model confidence exceeds a threshold; avoids noisy pseudo-labels.
</div>

<div style="background-color: #006400; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
== <span style="color: #FFFFFF;">Understanding</span> ==
Semi-supervised learning works by exploiting the **structure of the unlabeled data distribution** to constrain the label function. The key assumptions:

**Smoothness**: Nearby points → similar labels. If two images of dogs are close in feature space, they should both be labeled "dog."

**Cluster**: Classes form clusters. The decision boundary should pass through low-density regions between clusters, not through high-density regions.

**Manifold**: Data lies on lower-dimensional manifolds. Using unlabeled data to learn the manifold structure helps place decision boundaries correctly.

**Self-training process**: (1) Train on labeled data. (2) Predict labels for unlabeled data. (3) Add high-confidence predictions to training set. (4) Retrain. (5) Repeat. Risk: confident errors propagate (confirmation bias). Mitigated by strict confidence thresholds.

**FixMatch**: The state-of-the-art simple baseline. For each unlabeled image: (1) Apply weak augmentation (horizontal flip, crop). (2) If prediction confidence > 0.95, use as pseudo-label. (3) Apply strong augmentation (RandAugment). (4) Train student to predict the pseudo-label on the strongly augmented view. This enforces consistency across augmentation strengths while only training on confident pseudo-labels.

**When does semi-supervised help most?** When labeled data is very scarce (<1000 examples) and unlabeled data shares the same distribution as labeled data. When distributions differ (domain shift between labeled and unlabeled), semi-supervised can hurt — a form of negative transfer.
</div>

<div style="background-color: #8B0000; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
== <span style="color: #FFFFFF;">Applying</span> ==
'''FixMatch implementation:'''
<syntaxhighlight lang="python">
import torch
import torch.nn.functional as F

def fixmatch_loss(model, labeled_x, labels, unlabeled_x_weak, unlabeled_x_strong,
                  threshold=0.95, lambda_u=1.0):
    # Supervised loss on labeled data
    logits_labeled = model(labeled_x)
    loss_supervised = F.cross_entropy(logits_labeled, labels)

    # Pseudo-label on weakly augmented unlabeled data
    with torch.no_grad():
        logits_weak = model(unlabeled_x_weak)
        probs_weak = F.softmax(logits_weak, dim=-1)
        max_probs, pseudo_labels = probs_weak.max(dim=-1)
        # Mask: only use predictions above confidence threshold
        mask = (max_probs >= threshold).float()

    # Consistency loss: predict pseudo-label on strongly augmented version
    logits_strong = model(unlabeled_x_strong)
    loss_unsupervised = (F.cross_entropy(logits_strong, pseudo_labels, reduction='none') * mask).mean()

    return loss_supervised + lambda_u * loss_unsupervised
</syntaxhighlight>

; Semi-supervised method selection
: '''Image classification''' → FixMatch, FlexMatch, FreeMatch (confidence threshold scheduling)
: '''NLP''' → UDA (Unsupervised Data Augmentation), pre-train then fine-tune (BERT approach)
: '''Graph data''' → Label propagation, Graph Convolutional Networks (GCN)
: '''Small labeled set (<100 samples)''' → Mean Teacher, MixMatch
: '''Production setting''' → Self-training with pseudo-labels (simple, scalable)
</div>

<div style="background-color: #8B4500; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
== <span style="color: #FFFFFF;">Analyzing</span> ==
{| class="wikitable"
|+ Semi-Supervised Methods Comparison
! Method !! Labeled Data Needed !! Key Idea !! Best Domain
|-
| Self-training || ~10-20% || Confidence filtering || Any
|-
| FixMatch || <1% || Consistency + threshold || Vision
|-
| Mean Teacher || <5% || EMA teacher labels || Vision
|-
| Label Propagation || ~5% || Graph diffusion || Low-dim, graph
|-
| BERT fine-tuning || <1% (semi) || Large pre-training || NLP
|}

'''Failure modes''': Pseudo-label noise — incorrect confident predictions pollute training. Distribution mismatch — unlabeled data from different distribution hurts performance. Over-fitting on pseudo-labels — model memorizes spurious patterns in pseudo-labels. Confirmation bias — model fails to correct its own early confident errors.
</div>

<div style="background-color: #483D8B; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
== <span style="color: #FFFFFF;">Evaluating</span> ==
Evaluation must match the practical setting: hold out a labeled test set; train only on the (small labeled) + (large unlabeled) split. Report accuracy as a function of labeled data fraction (1%, 5%, 10%) to show the semi-supervised benefit curve. Compare against: (1) supervised-only (small labeled set), (2) fully supervised (all labeled), and (3) self-supervised pre-training + fine-tuning as competing baselines.
</div>

<div style="background-color: #2F4F4F; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
== <span style="color: #FFFFFF;">Creating</span> ==
Designing a semi-supervised pipeline: (1) Start with self-training — simple, effective, easy to implement. (2) Set high confidence threshold (0.95+) to avoid noisy pseudo-labels. (3) Apply curriculum: increase unlabeled data usage as model improves (FlexMatch adaptive threshold). (4) For vision: use FixMatch with RandAugment strong augmentation. (5) For NLP: leverage domain-adaptive pre-training on unlabeled data, then fine-tune on labels. (6) Monitor pseudo-label quality: compute accuracy of pseudo-labels on held-out labeled data as a proxy for noise level.

[[Category:Artificial Intelligence]]
[[Category:Machine Learning]]
[[Category:Semi-Supervised Learning]]
</div>