Editing Active Learning (section)

== <span style="color: #FFFFFF;">Understanding</span> ==
In most ML applications, labeling data is the bottleneck — not the model or the compute. A radiologist might take 5 minutes to annotate a CT scan; annotating 100,000 scans would require years of expert time. Active learning addresses this directly by asking: "which 1000 scans should we label to get the most accurate model?"

'''Why random sampling is suboptimal''': If 98% of images in a dataset are cats and 2% are rare diseases, random sampling wastes most of the annotation budget on cats that the model already handles well. Active learning focuses the budget on examples near the decision boundary or in underrepresented regions.

'''Uncertainty sampling''': The simplest and most widely used strategy. After training on the current labeled set, apply the model to all unlabeled examples. Select the examples where the model is least confident (e.g., predicted probability closest to 0.5 for binary classification). The intuition: these are the examples the model is currently "on the fence" about — labeling them provides the most information.

'''Core-set selection''': Instead of uncertainty, select examples that are geographically most distant from already-labeled examples in the feature space. This ensures the labeled set covers the full data distribution — addressing the cold-start problem that uncertainty sampling faces (before training, all uncertainty estimates are uninformative).

'''The exploration-exploitation tension''': Uncertainty sampling exploits model knowledge to label informative examples, but can get stuck labeling outliers or noise (uncertain examples are sometimes uncertain because they're anomalies, not informative boundary cases). Core-set ensures exploration of the data distribution. BADGE (Batch Active Learning by Diverse Gradient Embeddings) combines both by selecting a diverse, high-gradient batch.
</div>

<div style="background-color: #8B0000; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">