Probabilistic Machine Learning - Revision history

Wordpad: BloomWiki: Probabilistic Machine Learning

2026-04-25T01:56:12Z

BloomWiki: Probabilistic Machine Learning

← Older revision		Revision as of 01:56, 25 April 2026
Line 1:		Line 1:
			<div style="background-color: #4B0082; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
	{{BloomIntro}}		{{BloomIntro}}
	Probabilistic machine learning frames prediction and inference as probability distributions rather than point estimates, enabling models to express uncertainty about their outputs. A probabilistic model doesn't just predict "this email is spam" — it predicts "this email has an 87% probability of being spam," with the uncertainty reflecting both the inherent randomness in the data and the model's knowledge limitations. Probabilistic ML encompasses Bayesian inference, probabilistic graphical models, Gaussian processes, and modern deep probabilistic models like variational autoencoders and normalizing flows.		Probabilistic machine learning frames prediction and inference as probability distributions rather than point estimates, enabling models to express uncertainty about their outputs. A probabilistic model doesn't just predict "this email is spam" — it predicts "this email has an 87% probability of being spam," with the uncertainty reflecting both the inherent randomness in the data and the model's knowledge limitations. Probabilistic ML encompasses Bayesian inference, probabilistic graphical models, Gaussian processes, and modern deep probabilistic models like variational autoencoders and normalizing flows.
			</div>

	== Remembering ==		__TOC__

			<div style="background-color: #000080; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
			== <span style="color: #FFFFFF;">Remembering</span> ==
	* '''Probability distribution''' — A function assigning probabilities to possible outcomes; the fundamental object of probabilistic ML.		* '''Probability distribution''' — A function assigning probabilities to possible outcomes; the fundamental object of probabilistic ML.
	* '''Prior''' — A distribution encoding beliefs before observing data: P(θ).		* '''Prior''' — A distribution encoding beliefs before observing data: P(θ).
Line 17:		Line 22:
	* '''Conformal prediction''' — A framework providing distribution-free prediction intervals with guaranteed coverage.		* '''Conformal prediction''' — A framework providing distribution-free prediction intervals with guaranteed coverage.
	* '''Calibration''' — A probabilistic model is calibrated if its confidence scores match empirical frequencies.		* '''Calibration''' — A probabilistic model is calibrated if its confidence scores match empirical frequencies.
			</div>

	== Understanding ==		<div style="background-color: #006400; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
			== <span style="color: #FFFFFF;">Understanding</span> ==
	Why probabilistic ML? Point predictions discard crucial information. When a medical AI says "positive for cancer" with 51% confidence, that's categorically different from 99% confidence — but a non-probabilistic classifier treats both identically. Probabilistic models express this uncertainty explicitly.		Why probabilistic ML? Point predictions discard crucial information. When a medical AI says "positive for cancer" with 51% confidence, that's categorically different from 99% confidence — but a non-probabilistic classifier treats both identically. Probabilistic models express this uncertainty explicitly.

Line 28:		Line 35:

	Conformal prediction provides distribution-free prediction sets with guaranteed coverage: given user-specified error rate α, the prediction set contains the true label with probability ≥ 1-α, regardless of the underlying distribution. This is a practical tool for adding rigorous uncertainty quantification to any classifier.		Conformal prediction provides distribution-free prediction sets with guaranteed coverage: given user-specified error rate α, the prediction set contains the true label with probability ≥ 1-α, regardless of the underlying distribution. This is a practical tool for adding rigorous uncertainty quantification to any classifier.
			</div>

	== Applying ==		<div style="background-color: #8B0000; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
			== <span style="color: #FFFFFF;">Applying</span> ==
	'''Conformal prediction for guaranteed coverage:'''		'''Conformal prediction for guaranteed coverage:'''
	<syntaxhighlight lang="python">		<syntaxhighlight lang="python">
Line 70:		Line 79:
	: '''Generative modeling''' → VAE (smooth latent space), normalizing flows (exact likelihood), diffusion models		: '''Generative modeling''' → VAE (smooth latent space), normalizing flows (exact likelihood), diffusion models
	: '''Sequential inference''' → HMMs, Kalman filters, particle filters		: '''Sequential inference''' → HMMs, Kalman filters, particle filters
			</div>

	== Analyzing ==		<div style="background-color: #8B4500; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
			== <span style="color: #FFFFFF;">Analyzing</span> ==
	{\| class="wikitable"		{\| class="wikitable"
	\|+ Uncertainty Estimation Comparison		\|+ Uncertainty Estimation Comparison
Line 90:		Line 101:

	'''Failure modes''': Overconfident point estimates causing unsafe decisions in high-stakes settings. Poor calibration — confidence scores don't match empirical frequencies. Distribution shift invalidating calibration. VAE posterior collapse — decoder ignores latent code. Conformal prediction requires exchangeable data — fails under distribution shift without adaptation.		'''Failure modes''': Overconfident point estimates causing unsafe decisions in high-stakes settings. Poor calibration — confidence scores don't match empirical frequencies. Distribution shift invalidating calibration. VAE posterior collapse — decoder ignores latent code. Conformal prediction requires exchangeable data — fails under distribution shift without adaptation.
			</div>

	== Evaluating ==		<div style="background-color: #483D8B; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
			== <span style="color: #FFFFFF;">Evaluating</span> ==
	Probabilistic model evaluation: (1) Calibration: reliability diagrams, ECE (Expected Calibration Error) — lower is better. (2) Sharpness: prediction sets should be as small as possible while maintaining coverage; a set containing all classes is valid but useless. (3) NLL (Negative Log-Likelihood): proper scoring rule penalizing both inaccuracy and overconfidence. (4) Coverage: for conformal prediction, empirically verify that guaranteed coverage holds. (5) Entropy: high-entropy predictions on uncertain inputs, low-entropy on certain ones — the ideal pattern.		Probabilistic model evaluation: (1) Calibration: reliability diagrams, ECE (Expected Calibration Error) — lower is better. (2) Sharpness: prediction sets should be as small as possible while maintaining coverage; a set containing all classes is valid but useless. (3) NLL (Negative Log-Likelihood): proper scoring rule penalizing both inaccuracy and overconfidence. (4) Coverage: for conformal prediction, empirically verify that guaranteed coverage holds. (5) Entropy: high-entropy predictions on uncertain inputs, low-entropy on certain ones — the ideal pattern.
			</div>

	== Creating ==		<div style="background-color: #2F4F4F; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
			== <span style="color: #FFFFFF;">Creating</span> ==
	Designing a probabilistic prediction pipeline: (1) Choose model type based on data size and uncertainty needs. (2) Train base model; add conformal calibration on held-out calibration set. (3) Set α based on acceptable error rate for the application (medical: α=0.01, recommendation: α=0.1). (4) Produce prediction sets rather than point predictions; communicate uncertainty to downstream users. (5) Monitor calibration in production: track ECE on new data; alert if calibration degrades. (6) For distribution shift: use adaptive conformal prediction (ACI) which continuously updates the quantile threshold.		Designing a probabilistic prediction pipeline: (1) Choose model type based on data size and uncertainty needs. (2) Train base model; add conformal calibration on held-out calibration set. (3) Set α based on acceptable error rate for the application (medical: α=0.01, recommendation: α=0.1). (4) Produce prediction sets rather than point predictions; communicate uncertainty to downstream users. (5) Monitor calibration in production: track ECE on new data; alert if calibration degrades. (6) For distribution shift: use adaptive conformal prediction (ACI) which continuously updates the quantile threshold.

Line 100:		Line 115:
	[[Category:Machine Learning]]		[[Category:Machine Learning]]
	[[Category:Probabilistic ML]]		[[Category:Probabilistic ML]]
			</div>

Wordpad: New BloomWiki article: Probabilistic Machine Learning

2026-04-23T08:12:56Z

New BloomWiki article: Probabilistic Machine Learning

New page

{{BloomIntro}}
Probabilistic machine learning frames prediction and inference as probability distributions rather than point estimates, enabling models to express uncertainty about their outputs. A probabilistic model doesn't just predict "this email is spam" — it predicts "this email has an 87% probability of being spam," with the uncertainty reflecting both the inherent randomness in the data and the model's knowledge limitations. Probabilistic ML encompasses Bayesian inference, probabilistic graphical models, Gaussian processes, and modern deep probabilistic models like variational autoencoders and normalizing flows.

== Remembering ==
* '''Probability distribution''' — A function assigning probabilities to possible outcomes; the fundamental object of probabilistic ML.
* '''Prior''' — A distribution encoding beliefs before observing data: P(θ).
* '''Posterior''' — Updated beliefs after observing data: P(θ|D).
* '''Likelihood''' — The probability of the data given model parameters: P(D|θ).
* '''MAP (Maximum A Posteriori)''' — Finding the mode of the posterior; regularized point estimate.
* '''MLE (Maximum Likelihood Estimation)''' — Finding parameters maximizing P(D|θ); no prior.
* '''Probabilistic graphical model''' — Represents joint distributions over many variables using graph structure (Bayesian networks, Markov random fields).
* '''Bayesian network''' — A directed acyclic graph encoding conditional independence relationships.
* '''Hidden Markov Model (HMM)''' — A probabilistic sequence model with hidden states; classic for speech and bioinformatics.
* '''Variational Autoencoder (VAE)''' — A generative model using variational inference to learn a probabilistic latent space.
* '''Normalizing flow''' — A generative model constructed by composing invertible transformations to transform a simple distribution into a complex one.
* '''ELBO (Evidence Lower Bound)''' — The objective maximized in variational inference: log P(D) ≥ ELBO.
* '''Conformal prediction''' — A framework providing distribution-free prediction intervals with guaranteed coverage.
* '''Calibration''' — A probabilistic model is calibrated if its confidence scores match empirical frequencies.

== Understanding ==
**Why probabilistic ML?** Point predictions discard crucial information. When a medical AI says "positive for cancer" with 51% confidence, that's categorically different from 99% confidence — but a non-probabilistic classifier treats both identically. Probabilistic models express this uncertainty explicitly.

**Sources of uncertainty**: (1) **Aleatoric** (irreducible): inherent randomness in the data-generating process. Even with infinite data, some outcomes are unpredictable — e.g., quantum effects, chaotic systems. (2) **Epistemic** (reducible): uncertainty due to limited knowledge. With more data, the model becomes more certain. Good probabilistic models distinguish these two types.

**Probabilistic graphical models** encode joint distributions over many variables efficiently using conditional independence assumptions. A Bayesian network for medical diagnosis might have nodes for symptoms, diseases, and test results, with edges encoding conditional dependencies. Inference algorithms (variable elimination, belief propagation) compute posterior probabilities of unobserved variables.

**Deep probabilistic models**: VAEs combine deep learning with variational inference. The encoder maps inputs to a distribution over latent codes (not a point); the decoder maps sampled latent codes back to reconstructions. This enables generation (sample from the latent space) and uncertainty quantification. Normalizing flows model complex distributions by composing simple invertible transformations with analytically tractable Jacobians.

**Conformal prediction** provides distribution-free prediction sets with guaranteed coverage: given user-specified error rate α, the prediction set contains the true label with probability ≥ 1-α, regardless of the underlying distribution. This is a practical tool for adding rigorous uncertainty quantification to any classifier.

== Applying ==
'''Conformal prediction for guaranteed coverage:'''
<syntaxhighlight lang="python">
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

# Conformal prediction adds rigorous uncertainty quantification to any classifier
X, y = load_classification_dataset()
X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.4)
X_cal, X_test, y_cal, y_test = train_test_split(X_temp, y_temp, test_size=0.5)

# Train base classifier
clf = RandomForestClassifier(n_estimators=100).fit(X_train, y_train)

# Calibration: compute nonconformity scores (1 - predicted prob of true class)
cal_probs = clf.predict_proba(X_cal)
cal_scores = 1 - cal_probs[np.arange(len(y_cal)), y_cal] # Nonconformity scores

# Set coverage level
alpha = 0.1 # 90% coverage guarantee
threshold = np.quantile(cal_scores, (1 + 1/len(y_cal)) * (1 - alpha))

# Prediction sets for test examples
test_probs = clf.predict_proba(X_test)
def get_prediction_set(probs, threshold):
return np.where(1 - probs <= threshold)[0] # Include all classes with score ≤ threshold

prediction_sets = [get_prediction_set(p, threshold) for p in test_probs]
coverage = np.mean([y_test[i] in s for i, s in enumerate(prediction_sets)])
print(f"Coverage: {coverage:.2%} (target: {1-alpha:.0%})") # Should be ≥ 90%
avg_set_size = np.mean([len(s) for s in prediction_sets])
print(f"Average prediction set size: {avg_set_size:.2f}") # Smaller = more efficient
</syntaxhighlight>

; Probabilistic ML method selection
: '''Regression with uncertainty''' → Gaussian processes (small data), NGBoost, CARD
: '''Classification with calibration''' → Calibrated RF/XGBoost (Platt/isotonic); temperature scaling for DNN
: '''Guaranteed coverage''' → Conformal prediction (any model, any distribution)
: '''Generative modeling''' → VAE (smooth latent space), normalizing flows (exact likelihood), diffusion models
: '''Sequential inference''' → HMMs, Kalman filters, particle filters

== Analyzing ==
{| class="wikitable"
|+ Uncertainty Estimation Comparison
! Method !! Type of Uncertainty !! Coverage Guarantee !! Computational Cost
|-
| Point estimate + softmax || None (overconfident) || None || Very low
|-
| Temperature scaling || Calibrated confidence || Empirical only || Very low
|-
| MC Dropout || Epistemic (approx) || None || Low
|-
| Deep Ensembles || Both (approx) || None || High
|-
| Conformal prediction || Distribution-free sets || Guaranteed (1-α) || Low
|-
| Gaussian process || Epistemic (exact for GP) || Bayesian || Very high
|}

'''Failure modes''': Overconfident point estimates causing unsafe decisions in high-stakes settings. Poor calibration — confidence scores don't match empirical frequencies. Distribution shift invalidating calibration. VAE posterior collapse — decoder ignores latent code. Conformal prediction requires exchangeable data — fails under distribution shift without adaptation.

== Evaluating ==
Probabilistic model evaluation: (1) **Calibration**: reliability diagrams, ECE (Expected Calibration Error) — lower is better. (2) **Sharpness**: prediction sets should be as small as possible while maintaining coverage; a set containing all classes is valid but useless. (3) **NLL (Negative Log-Likelihood)**: proper scoring rule penalizing both inaccuracy and overconfidence. (4) **Coverage**: for conformal prediction, empirically verify that guaranteed coverage holds. (5) **Entropy**: high-entropy predictions on uncertain inputs, low-entropy on certain ones — the ideal pattern.

== Creating ==
Designing a probabilistic prediction pipeline: (1) Choose model type based on data size and uncertainty needs. (2) Train base model; add conformal calibration on held-out calibration set. (3) Set α based on acceptable error rate for the application (medical: α=0.01, recommendation: α=0.1). (4) Produce prediction sets rather than point predictions; communicate uncertainty to downstream users. (5) Monitor calibration in production: track ECE on new data; alert if calibration degrades. (6) For distribution shift: use adaptive conformal prediction (ACI) which continuously updates the quantile threshold.

[[Category:Artificial Intelligence]]
[[Category:Machine Learning]]
[[Category:Probabilistic ML]]