Semi Supervised - Revision history

Wordpad: BloomWiki: Semi Supervised

2026-04-25T01:57:34Z

BloomWiki: Semi Supervised

← Older revision		Revision as of 01:57, 25 April 2026
Line 1:		Line 1:
			<div style="background-color: #4B0082; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
	{{BloomIntro}}		{{BloomIntro}}
	Semi-supervised learning sits between supervised learning (which requires labels for all training data) and unsupervised learning (which uses no labels). It leverages a small amount of labeled data alongside a large amount of unlabeled data to train better models than either approach alone. Since labeled data is expensive and time-consuming to acquire while unlabeled data is often abundantly available, semi-supervised learning is highly practical. Modern variants include pseudo-labeling, consistency regularization, and graph-based methods.		Semi-supervised learning sits between supervised learning (which requires labels for all training data) and unsupervised learning (which uses no labels). It leverages a small amount of labeled data alongside a large amount of unlabeled data to train better models than either approach alone. Since labeled data is expensive and time-consuming to acquire while unlabeled data is often abundantly available, semi-supervised learning is highly practical. Modern variants include pseudo-labeling, consistency regularization, and graph-based methods.
			</div>

	== Remembering ==		__TOC__

			<div style="background-color: #000080; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
			== <span style="color: #FFFFFF;">Remembering</span> ==
	* '''Semi-supervised learning''' — Learning using a small labeled dataset and a large unlabeled dataset simultaneously.		* '''Semi-supervised learning''' — Learning using a small labeled dataset and a large unlabeled dataset simultaneously.
	* '''Pseudo-labeling''' — Using a model's predictions on unlabeled data as provisional labels, then retraining on those labels.		* '''Pseudo-labeling''' — Using a model's predictions on unlabeled data as provisional labels, then retraining on those labels.
Line 17:		Line 22:
	* '''Cluster assumption''' — Decision boundaries should lie in low-density regions between clusters.		* '''Cluster assumption''' — Decision boundaries should lie in low-density regions between clusters.
	* '''Confidence threshold''' — In pseudo-labeling, only use predictions where model confidence exceeds a threshold; avoids noisy pseudo-labels.		* '''Confidence threshold''' — In pseudo-labeling, only use predictions where model confidence exceeds a threshold; avoids noisy pseudo-labels.
			</div>

	== Understanding ==		<div style="background-color: #006400; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
			== <span style="color: #FFFFFF;">Understanding</span> ==
	Semi-supervised learning works by exploiting the '''structure of the unlabeled data distribution''' to constrain the label function. The key assumptions:		Semi-supervised learning works by exploiting the '''structure of the unlabeled data distribution''' to constrain the label function. The key assumptions:

Line 41:		Line 48:

	'''When does semi-supervised help most?''' When labeled data is very scarce (<1000 examples) and unlabeled data shares the same distribution as labeled data. When distributions differ (domain shift between labeled and unlabeled), semi-supervised can hurt — a form of negative transfer.		'''When does semi-supervised help most?''' When labeled data is very scarce (<1000 examples) and unlabeled data shares the same distribution as labeled data. When distributions differ (domain shift between labeled and unlabeled), semi-supervised can hurt — a form of negative transfer.
			</div>

	== Applying ==		<div style="background-color: #8B0000; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
			== <span style="color: #FFFFFF;">Applying</span> ==
	'''FixMatch implementation:'''		'''FixMatch implementation:'''
	<syntaxhighlight lang="python">		<syntaxhighlight lang="python">
Line 75:		Line 84:
	: '''Small labeled set (<100 samples)''' → Mean Teacher, MixMatch		: '''Small labeled set (<100 samples)''' → Mean Teacher, MixMatch
	: '''Production setting''' → Self-training with pseudo-labels (simple, scalable)		: '''Production setting''' → Self-training with pseudo-labels (simple, scalable)
			</div>

	== Analyzing ==		<div style="background-color: #8B4500; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
			== <span style="color: #FFFFFF;">Analyzing</span> ==
	{\| class="wikitable"		{\| class="wikitable"
	\|+ Semi-Supervised Methods Comparison		\|+ Semi-Supervised Methods Comparison
Line 93:		Line 104:

	'''Failure modes''': Pseudo-label noise — incorrect confident predictions pollute training. Distribution mismatch — unlabeled data from different distribution hurts performance. Over-fitting on pseudo-labels — model memorizes spurious patterns in pseudo-labels. Confirmation bias — model fails to correct its own early confident errors.		'''Failure modes''': Pseudo-label noise — incorrect confident predictions pollute training. Distribution mismatch — unlabeled data from different distribution hurts performance. Over-fitting on pseudo-labels — model memorizes spurious patterns in pseudo-labels. Confirmation bias — model fails to correct its own early confident errors.
			</div>

	== Evaluating ==		<div style="background-color: #483D8B; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
			== <span style="color: #FFFFFF;">Evaluating</span> ==
	Evaluation must match the practical setting: hold out a labeled test set; train only on the (small labeled) + (large unlabeled) split. Report accuracy as a function of labeled data fraction (1%, 5%, 10%) to show the semi-supervised benefit curve. Compare against:		Evaluation must match the practical setting: hold out a labeled test set; train only on the (small labeled) + (large unlabeled) split. Report accuracy as a function of labeled data fraction (1%, 5%, 10%) to show the semi-supervised benefit curve. Compare against:
	# supervised-only (small labeled set),		# supervised-only (small labeled set),
	# fully supervised (all labeled), and		# fully supervised (all labeled), and
	# self-supervised pre-training + fine-tuning as competing baselines.		# self-supervised pre-training + fine-tuning as competing baselines.
			</div>

	== Creating ==		<div style="background-color: #2F4F4F; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
			== <span style="color: #FFFFFF;">Creating</span> ==
	Designing a semi-supervised pipeline:		Designing a semi-supervised pipeline:
	# Start with self-training — simple, effective, easy to implement.		# Start with self-training — simple, effective, easy to implement.
Line 112:		Line 127:
	[[Category:Machine Learning]]		[[Category:Machine Learning]]
	[[Category:Semi-Supervised Learning]]		[[Category:Semi-Supervised Learning]]
			</div>

Wordpad: BloomWiki: Semi Supervised

2026-04-23T14:35:33Z

BloomWiki: Semi Supervised

← Older revision		Revision as of 14:35, 23 April 2026
Line 27:		Line 27:
	'''Manifold''': Data lies on lower-dimensional manifolds. Using unlabeled data to learn the manifold structure helps place decision boundaries correctly.		'''Manifold''': Data lies on lower-dimensional manifolds. Using unlabeled data to learn the manifold structure helps place decision boundaries correctly.

	'''Self-training process''': ~~(1)~~ Train on labeled data. ~~(2)~~ Predict labels for unlabeled data. ~~(3)~~ Add high-confidence predictions to training set. ~~(4)~~ Retrain. ~~(5)~~ Repeat. Risk: confident errors propagate (confirmation bias). Mitigated by strict confidence thresholds.		'''Self-training process''':
			# Train on labeled data.
			# Predict labels for unlabeled data.
			# Add high-confidence predictions to training set.
			# Retrain.
			# Repeat. Risk: confident errors propagate (confirmation bias). Mitigated by strict confidence thresholds.

	'''FixMatch''': The state-of-the-art simple baseline. For each unlabeled image: ~~(1)~~ Apply weak augmentation (horizontal flip, crop). ~~(2)~~ If prediction confidence > 0.95, use as pseudo-label. ~~(3)~~ Apply strong augmentation (RandAugment). ~~(4)~~ Train student to predict the pseudo-label on the strongly augmented view. This enforces consistency across augmentation strengths while only training on confident pseudo-labels.		'''FixMatch''': The state-of-the-art simple baseline. For each unlabeled image:
			# Apply weak augmentation (horizontal flip, crop).
			# If prediction confidence > 0.95, use as pseudo-label.
			# Apply strong augmentation (RandAugment).
			# Train student to predict the pseudo-label on the strongly augmented view. This enforces consistency across augmentation strengths while only training on confident pseudo-labels.

	'''When does semi-supervised help most?''' When labeled data is very scarce (<1000 examples) and unlabeled data shares the same distribution as labeled data. When distributions differ (domain shift between labeled and unlabeled), semi-supervised can hurt — a form of negative transfer.		'''When does semi-supervised help most?''' When labeled data is very scarce (<1000 examples) and unlabeled data shares the same distribution as labeled data. When distributions differ (domain shift between labeled and unlabeled), semi-supervised can hurt — a form of negative transfer.
Line 86:		Line 95:

	== Evaluating ==		== Evaluating ==
	Evaluation must match the practical setting: hold out a labeled test set; train only on the (small labeled) + (large unlabeled) split. Report accuracy as a function of labeled data fraction (1%, 5%, 10%) to show the semi-supervised benefit curve. Compare against: ~~(1)~~ supervised-only (small labeled set), ~~(2)~~ fully supervised (all labeled), and ~~(3)~~ self-supervised pre-training + fine-tuning as competing baselines.		Evaluation must match the practical setting: hold out a labeled test set; train only on the (small labeled) + (large unlabeled) split. Report accuracy as a function of labeled data fraction (1%, 5%, 10%) to show the semi-supervised benefit curve. Compare against:
			# supervised-only (small labeled set),
			# fully supervised (all labeled), and
			# self-supervised pre-training + fine-tuning as competing baselines.

	== Creating ==		== Creating ==
	Designing a semi-supervised pipeline: ~~(1)~~ Start with self-training — simple, effective, easy to implement. ~~(2)~~ Set high confidence threshold (0.95+) to avoid noisy pseudo-labels. ~~(3)~~ Apply curriculum: increase unlabeled data usage as model improves (FlexMatch adaptive threshold). ~~(4)~~ For vision: use FixMatch with RandAugment strong augmentation. ~~(5)~~ For NLP: leverage domain-adaptive pre-training on unlabeled data, then fine-tune on labels. ~~(6)~~ Monitor pseudo-label quality: compute accuracy of pseudo-labels on held-out labeled data as a proxy for noise level.		Designing a semi-supervised pipeline:
			# Start with self-training — simple, effective, easy to implement.
			# Set high confidence threshold (0.95+) to avoid noisy pseudo-labels.
			# Apply curriculum: increase unlabeled data usage as model improves (FlexMatch adaptive threshold).
			# For vision: use FixMatch with RandAugment strong augmentation.
			# For NLP: leverage domain-adaptive pre-training on unlabeled data, then fine-tune on labels.
			# Monitor pseudo-label quality: compute accuracy of pseudo-labels on held-out labeled data as a proxy for noise level.

	[[Category:Artificial Intelligence]]		[[Category:Artificial Intelligence]]
	[[Category:Machine Learning]]		[[Category:Machine Learning]]
	[[Category:Semi-Supervised Learning]]		[[Category:Semi-Supervised Learning]]

Wordpad: BloomWiki: Semi Supervised

2026-04-23T14:20:08Z

BloomWiki: Semi Supervised

New page

{{BloomIntro}}
Semi-supervised learning sits between supervised learning (which requires labels for all training data) and unsupervised learning (which uses no labels). It leverages a small amount of labeled data alongside a large amount of unlabeled data to train better models than either approach alone. Since labeled data is expensive and time-consuming to acquire while unlabeled data is often abundantly available, semi-supervised learning is highly practical. Modern variants include pseudo-labeling, consistency regularization, and graph-based methods.

== Remembering ==
* '''Semi-supervised learning''' — Learning using a small labeled dataset and a large unlabeled dataset simultaneously.
* '''Pseudo-labeling''' — Using a model's predictions on unlabeled data as provisional labels, then retraining on those labels.
* '''Consistency regularization''' — Enforcing that model predictions remain consistent under perturbations of unlabeled inputs.
* '''Mean Teacher''' — A semi-supervised method where a student model is trained, and the teacher model is an exponential moving average of student weights; teacher provides pseudo-labels.
* '''FixMatch''' — A state-of-the-art semi-supervised image classification method using confidence thresholding and weak/strong augmentation consistency.
* '''MixMatch''' — A holistic semi-supervised approach combining pseudo-labeling, consistency regularization, and MixUp data augmentation.
* '''Self-training''' — Train on labeled data, predict labels for unlabeled data, retrain on the combination; repeat iteratively.
* '''Co-training''' — Train two models on different feature views; each provides pseudo-labels for the other.
* '''Graph-based methods''' — Propagate labels through a graph where edges represent similarity between examples (label propagation).
* '''Label propagation''' — Semi-supervised algorithm that spreads labels from labeled to unlabeled examples through a similarity graph.
* '''Manifold assumption''' — The assumption that data lies on a low-dimensional manifold; points on the same manifold should have the same label.
* '''Smoothness assumption''' — If two points are close in input space, they should have similar labels.
* '''Cluster assumption''' — Decision boundaries should lie in low-density regions between clusters.
* '''Confidence threshold''' — In pseudo-labeling, only use predictions where model confidence exceeds a threshold; avoids noisy pseudo-labels.

== Understanding ==
Semi-supervised learning works by exploiting the '''structure of the unlabeled data distribution''' to constrain the label function. The key assumptions:

'''Smoothness''': Nearby points → similar labels. If two images of dogs are close in feature space, they should both be labeled "dog."

'''Cluster''': Classes form clusters. The decision boundary should pass through low-density regions between clusters, not through high-density regions.

'''Manifold''': Data lies on lower-dimensional manifolds. Using unlabeled data to learn the manifold structure helps place decision boundaries correctly.

'''Self-training process''': (1) Train on labeled data. (2) Predict labels for unlabeled data. (3) Add high-confidence predictions to training set. (4) Retrain. (5) Repeat. Risk: confident errors propagate (confirmation bias). Mitigated by strict confidence thresholds.

'''FixMatch''': The state-of-the-art simple baseline. For each unlabeled image: (1) Apply weak augmentation (horizontal flip, crop). (2) If prediction confidence > 0.95, use as pseudo-label. (3) Apply strong augmentation (RandAugment). (4) Train student to predict the pseudo-label on the strongly augmented view. This enforces consistency across augmentation strengths while only training on confident pseudo-labels.

'''When does semi-supervised help most?''' When labeled data is very scarce (<1000 examples) and unlabeled data shares the same distribution as labeled data. When distributions differ (domain shift between labeled and unlabeled), semi-supervised can hurt — a form of negative transfer.

== Applying ==
'''FixMatch implementation:'''
<syntaxhighlight lang="python">
import torch
import torch.nn.functional as F

def fixmatch_loss(model, labeled_x, labels, unlabeled_x_weak, unlabeled_x_strong,
threshold=0.95, lambda_u=1.0):
# Supervised loss on labeled data
logits_labeled = model(labeled_x)
loss_supervised = F.cross_entropy(logits_labeled, labels)

# Pseudo-label on weakly augmented unlabeled data
with torch.no_grad():
logits_weak = model(unlabeled_x_weak)
probs_weak = F.softmax(logits_weak, dim=-1)
max_probs, pseudo_labels = probs_weak.max(dim=-1)
# Mask: only use predictions above confidence threshold
mask = (max_probs >= threshold).float()

# Consistency loss: predict pseudo-label on strongly augmented version
logits_strong = model(unlabeled_x_strong)
loss_unsupervised = (F.cross_entropy(logits_strong, pseudo_labels, reduction='none') * mask).mean()

return loss_supervised + lambda_u * loss_unsupervised
</syntaxhighlight>

; Semi-supervised method selection
: '''Image classification''' → FixMatch, FlexMatch, FreeMatch (confidence threshold scheduling)
: '''NLP''' → UDA (Unsupervised Data Augmentation), pre-train then fine-tune (BERT approach)
: '''Graph data''' → Label propagation, Graph Convolutional Networks (GCN)
: '''Small labeled set (<100 samples)''' → Mean Teacher, MixMatch
: '''Production setting''' → Self-training with pseudo-labels (simple, scalable)

== Analyzing ==
{| class="wikitable"
|+ Semi-Supervised Methods Comparison
! Method !! Labeled Data Needed !! Key Idea !! Best Domain
|-
| Self-training || ~10-20% || Confidence filtering || Any
|-
| FixMatch || <1% || Consistency + threshold || Vision
|-
| Mean Teacher || <5% || EMA teacher labels || Vision
|-
| Label Propagation || ~5% || Graph diffusion || Low-dim, graph
|-
| BERT fine-tuning || <1% (semi) || Large pre-training || NLP
|}

'''Failure modes''': Pseudo-label noise — incorrect confident predictions pollute training. Distribution mismatch — unlabeled data from different distribution hurts performance. Over-fitting on pseudo-labels — model memorizes spurious patterns in pseudo-labels. Confirmation bias — model fails to correct its own early confident errors.

== Evaluating ==
Evaluation must match the practical setting: hold out a labeled test set; train only on the (small labeled) + (large unlabeled) split. Report accuracy as a function of labeled data fraction (1%, 5%, 10%) to show the semi-supervised benefit curve. Compare against: (1) supervised-only (small labeled set), (2) fully supervised (all labeled), and (3) self-supervised pre-training + fine-tuning as competing baselines.

== Creating ==
Designing a semi-supervised pipeline: (1) Start with self-training — simple, effective, easy to implement. (2) Set high confidence threshold (0.95+) to avoid noisy pseudo-labels. (3) Apply curriculum: increase unlabeled data usage as model improves (FlexMatch adaptive threshold). (4) For vision: use FixMatch with RandAugment strong augmentation. (5) For NLP: leverage domain-adaptive pre-training on unlabeled data, then fine-tune on labels. (6) Monitor pseudo-label quality: compute accuracy of pseudo-labels on held-out labeled data as a proxy for noise level.

[[Category:Artificial Intelligence]]
[[Category:Machine Learning]]
[[Category:Semi-Supervised Learning]]