Probabilistic Ml: Difference between revisions

From BloomWiki
Jump to navigation Jump to search
BloomWiki: Probabilistic Ml
 
BloomWiki: Probabilistic Ml
 
(One intermediate revision by the same user not shown)
Line 1: Line 1:
<div style="background-color: #4B0082; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
{{BloomIntro}}
{{BloomIntro}}
Probabilistic machine learning frames prediction and inference as probability distributions rather than point estimates, enabling models to express uncertainty about their outputs. A probabilistic model doesn't just predict "this email is spam" — it predicts "this email has an 87% probability of being spam," with the uncertainty reflecting both the inherent randomness in the data and the model's knowledge limitations. Probabilistic ML encompasses Bayesian inference, probabilistic graphical models, Gaussian processes, and modern deep probabilistic models like variational autoencoders and normalizing flows.
Probabilistic machine learning frames prediction and inference as probability distributions rather than point estimates, enabling models to express uncertainty about their outputs. A probabilistic model doesn't just predict "this email is spam" — it predicts "this email has an 87% probability of being spam," with the uncertainty reflecting both the inherent randomness in the data and the model's knowledge limitations. Probabilistic ML encompasses Bayesian inference, probabilistic graphical models, Gaussian processes, and modern deep probabilistic models like variational autoencoders and normalizing flows.
</div>


== Remembering ==
__TOC__
 
<div style="background-color: #000080; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
== <span style="color: #FFFFFF;">Remembering</span> ==
* '''Probability distribution''' — A function assigning probabilities to possible outcomes; the fundamental object of probabilistic ML.
* '''Probability distribution''' — A function assigning probabilities to possible outcomes; the fundamental object of probabilistic ML.
* '''Prior''' — A distribution encoding beliefs before observing data: P(θ).
* '''Prior''' — A distribution encoding beliefs before observing data: P(θ).
Line 17: Line 22:
* '''Conformal prediction''' — A framework providing distribution-free prediction intervals with guaranteed coverage.
* '''Conformal prediction''' — A framework providing distribution-free prediction intervals with guaranteed coverage.
* '''Calibration''' — A probabilistic model is calibrated if its confidence scores match empirical frequencies.
* '''Calibration''' — A probabilistic model is calibrated if its confidence scores match empirical frequencies.
</div>


== Understanding ==
<div style="background-color: #006400; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
== <span style="color: #FFFFFF;">Understanding</span> ==
'''Why probabilistic ML?''' Point predictions discard crucial information. When a medical AI says "positive for cancer" with 51% confidence, that's categorically different from 99% confidence — but a non-probabilistic classifier treats both identically. Probabilistic models express this uncertainty explicitly.
'''Why probabilistic ML?''' Point predictions discard crucial information. When a medical AI says "positive for cancer" with 51% confidence, that's categorically different from 99% confidence — but a non-probabilistic classifier treats both identically. Probabilistic models express this uncertainty explicitly.


'''Sources of uncertainty''': (1) '''Aleatoric''' (irreducible): inherent randomness in the data-generating process. Even with infinite data, some outcomes are unpredictable — e.g., quantum effects, chaotic systems. (2) '''Epistemic''' (reducible): uncertainty due to limited knowledge. With more data, the model becomes more certain. Good probabilistic models distinguish these two types.
'''Sources of uncertainty''':
# '''Aleatoric''' (irreducible): inherent randomness in the data-generating process. Even with infinite data, some outcomes are unpredictable — e.g., quantum effects, chaotic systems.
# '''Epistemic''' (reducible): uncertainty due to limited knowledge. With more data, the model becomes more certain. Good probabilistic models distinguish these two types.


'''Probabilistic graphical models''' encode joint distributions over many variables efficiently using conditional independence assumptions. A Bayesian network for medical diagnosis might have nodes for symptoms, diseases, and test results, with edges encoding conditional dependencies. Inference algorithms (variable elimination, belief propagation) compute posterior probabilities of unobserved variables.
'''Probabilistic graphical models''' encode joint distributions over many variables efficiently using conditional independence assumptions. A Bayesian network for medical diagnosis might have nodes for symptoms, diseases, and test results, with edges encoding conditional dependencies. Inference algorithms (variable elimination, belief propagation) compute posterior probabilities of unobserved variables.
Line 28: Line 37:


'''Conformal prediction''' provides distribution-free prediction sets with guaranteed coverage: given user-specified error rate α, the prediction set contains the true label with probability ≥ 1-α, regardless of the underlying distribution. This is a practical tool for adding rigorous uncertainty quantification to any classifier.
'''Conformal prediction''' provides distribution-free prediction sets with guaranteed coverage: given user-specified error rate α, the prediction set contains the true label with probability ≥ 1-α, regardless of the underlying distribution. This is a practical tool for adding rigorous uncertainty quantification to any classifier.
</div>


== Applying ==
<div style="background-color: #8B0000; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
== <span style="color: #FFFFFF;">Applying</span> ==
'''Conformal prediction for guaranteed coverage:'''
'''Conformal prediction for guaranteed coverage:'''
<syntaxhighlight lang="python">
<syntaxhighlight lang="python">
Line 70: Line 81:
: '''Generative modeling''' → VAE (smooth latent space), normalizing flows (exact likelihood), diffusion models
: '''Generative modeling''' → VAE (smooth latent space), normalizing flows (exact likelihood), diffusion models
: '''Sequential inference''' → HMMs, Kalman filters, particle filters
: '''Sequential inference''' → HMMs, Kalman filters, particle filters
</div>


== Analyzing ==
<div style="background-color: #8B4500; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
== <span style="color: #FFFFFF;">Analyzing</span> ==
{| class="wikitable"
{| class="wikitable"
|+ Uncertainty Estimation Comparison
|+ Uncertainty Estimation Comparison
Line 90: Line 103:


'''Failure modes''': Overconfident point estimates causing unsafe decisions in high-stakes settings. Poor calibration — confidence scores don't match empirical frequencies. Distribution shift invalidating calibration. VAE posterior collapse — decoder ignores latent code. Conformal prediction requires exchangeable data — fails under distribution shift without adaptation.
'''Failure modes''': Overconfident point estimates causing unsafe decisions in high-stakes settings. Poor calibration — confidence scores don't match empirical frequencies. Distribution shift invalidating calibration. VAE posterior collapse — decoder ignores latent code. Conformal prediction requires exchangeable data — fails under distribution shift without adaptation.
</div>


== Evaluating ==
<div style="background-color: #483D8B; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
Probabilistic model evaluation: (1) '''Calibration''': reliability diagrams, ECE (Expected Calibration Error) — lower is better. (2) '''Sharpness''': prediction sets should be as small as possible while maintaining coverage; a set containing all classes is valid but useless. (3) '''NLL (Negative Log-Likelihood)''': proper scoring rule penalizing both inaccuracy and overconfidence. (4) '''Coverage''': for conformal prediction, empirically verify that guaranteed coverage holds. (5) '''Entropy''': high-entropy predictions on uncertain inputs, low-entropy on certain ones — the ideal pattern.
== <span style="color: #FFFFFF;">Evaluating</span> ==
Probabilistic model evaluation:
# '''Calibration''': reliability diagrams, ECE (Expected Calibration Error) — lower is better.
# '''Sharpness''': prediction sets should be as small as possible while maintaining coverage; a set containing all classes is valid but useless.
# '''NLL (Negative Log-Likelihood)''': proper scoring rule penalizing both inaccuracy and overconfidence.
# '''Coverage''': for conformal prediction, empirically verify that guaranteed coverage holds.
# '''Entropy''': high-entropy predictions on uncertain inputs, low-entropy on certain ones — the ideal pattern.
</div>


== Creating ==
<div style="background-color: #2F4F4F; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
Designing a probabilistic prediction pipeline: (1) Choose model type based on data size and uncertainty needs. (2) Train base model; add conformal calibration on held-out calibration set. (3) Set α based on acceptable error rate for the application (medical: α=0.01, recommendation: α=0.1). (4) Produce prediction sets rather than point predictions; communicate uncertainty to downstream users. (5) Monitor calibration in production: track ECE on new data; alert if calibration degrades. (6) For distribution shift: use adaptive conformal prediction (ACI) which continuously updates the quantile threshold.
== <span style="color: #FFFFFF;">Creating</span> ==
Designing a probabilistic prediction pipeline:
# Choose model type based on data size and uncertainty needs.
# Train base model; add conformal calibration on held-out calibration set.
# Set α based on acceptable error rate for the application (medical: α=0.01, recommendation: α=0.1).
# Produce prediction sets rather than point predictions; communicate uncertainty to downstream users.
# Monitor calibration in production: track ECE on new data; alert if calibration degrades.
# For distribution shift: use adaptive conformal prediction (ACI) which continuously updates the quantile threshold.


[[Category:Artificial Intelligence]]
[[Category:Artificial Intelligence]]
[[Category:Machine Learning]]
[[Category:Machine Learning]]
[[Category:Probabilistic ML]]
[[Category:Probabilistic ML]]
</div>

Latest revision as of 01:56, 25 April 2026

How to read this page: This article maps the topic from beginner to expert across six levels � Remembering, Understanding, Applying, Analyzing, Evaluating, and Creating. Scan the headings to see the full scope, then read from wherever your knowledge starts to feel uncertain. Learn more about how BloomWiki works ?

Probabilistic machine learning frames prediction and inference as probability distributions rather than point estimates, enabling models to express uncertainty about their outputs. A probabilistic model doesn't just predict "this email is spam" — it predicts "this email has an 87% probability of being spam," with the uncertainty reflecting both the inherent randomness in the data and the model's knowledge limitations. Probabilistic ML encompasses Bayesian inference, probabilistic graphical models, Gaussian processes, and modern deep probabilistic models like variational autoencoders and normalizing flows.

Remembering[edit]

  • Probability distribution — A function assigning probabilities to possible outcomes; the fundamental object of probabilistic ML.
  • Prior — A distribution encoding beliefs before observing data: P(θ).
  • Posterior — Updated beliefs after observing data: P(θ|D).
  • Likelihood — The probability of the data given model parameters: P(D|θ).
  • MAP (Maximum A Posteriori) — Finding the mode of the posterior; regularized point estimate.
  • MLE (Maximum Likelihood Estimation) — Finding parameters maximizing P(D|θ); no prior.
  • Probabilistic graphical model — Represents joint distributions over many variables using graph structure (Bayesian networks, Markov random fields).
  • Bayesian network — A directed acyclic graph encoding conditional independence relationships.
  • Hidden Markov Model (HMM) — A probabilistic sequence model with hidden states; classic for speech and bioinformatics.
  • Variational Autoencoder (VAE) — A generative model using variational inference to learn a probabilistic latent space.
  • Normalizing flow — A generative model constructed by composing invertible transformations to transform a simple distribution into a complex one.
  • ELBO (Evidence Lower Bound) — The objective maximized in variational inference: log P(D) ≥ ELBO.
  • Conformal prediction — A framework providing distribution-free prediction intervals with guaranteed coverage.
  • Calibration — A probabilistic model is calibrated if its confidence scores match empirical frequencies.

Understanding[edit]

Why probabilistic ML? Point predictions discard crucial information. When a medical AI says "positive for cancer" with 51% confidence, that's categorically different from 99% confidence — but a non-probabilistic classifier treats both identically. Probabilistic models express this uncertainty explicitly.

Sources of uncertainty:

  1. Aleatoric (irreducible): inherent randomness in the data-generating process. Even with infinite data, some outcomes are unpredictable — e.g., quantum effects, chaotic systems.
  2. Epistemic (reducible): uncertainty due to limited knowledge. With more data, the model becomes more certain. Good probabilistic models distinguish these two types.

Probabilistic graphical models encode joint distributions over many variables efficiently using conditional independence assumptions. A Bayesian network for medical diagnosis might have nodes for symptoms, diseases, and test results, with edges encoding conditional dependencies. Inference algorithms (variable elimination, belief propagation) compute posterior probabilities of unobserved variables.

Deep probabilistic models: VAEs combine deep learning with variational inference. The encoder maps inputs to a distribution over latent codes (not a point); the decoder maps sampled latent codes back to reconstructions. This enables generation (sample from the latent space) and uncertainty quantification. Normalizing flows model complex distributions by composing simple invertible transformations with analytically tractable Jacobians.

Conformal prediction provides distribution-free prediction sets with guaranteed coverage: given user-specified error rate α, the prediction set contains the true label with probability ≥ 1-α, regardless of the underlying distribution. This is a practical tool for adding rigorous uncertainty quantification to any classifier.

Applying[edit]

Conformal prediction for guaranteed coverage: <syntaxhighlight lang="python"> import numpy as np from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import train_test_split

  1. Conformal prediction adds rigorous uncertainty quantification to any classifier

X, y = load_classification_dataset() X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.4) X_cal, X_test, y_cal, y_test = train_test_split(X_temp, y_temp, test_size=0.5)

  1. Train base classifier

clf = RandomForestClassifier(n_estimators=100).fit(X_train, y_train)

  1. Calibration: compute nonconformity scores (1 - predicted prob of true class)

cal_probs = clf.predict_proba(X_cal) cal_scores = 1 - cal_probs[np.arange(len(y_cal)), y_cal] # Nonconformity scores

  1. Set coverage level

alpha = 0.1 # 90% coverage guarantee threshold = np.quantile(cal_scores, (1 + 1/len(y_cal)) * (1 - alpha))

  1. Prediction sets for test examples

test_probs = clf.predict_proba(X_test) def get_prediction_set(probs, threshold):

   return np.where(1 - probs <= threshold)[0]  # Include all classes with score ≤ threshold

prediction_sets = [get_prediction_set(p, threshold) for p in test_probs] coverage = np.mean([y_test[i] in s for i, s in enumerate(prediction_sets)]) print(f"Coverage: {coverage:.2%} (target: {1-alpha:.0%})") # Should be ≥ 90% avg_set_size = np.mean([len(s) for s in prediction_sets]) print(f"Average prediction set size: {avg_set_size:.2f}") # Smaller = more efficient </syntaxhighlight>

Probabilistic ML method selection
Regression with uncertainty → Gaussian processes (small data), NGBoost, CARD
Classification with calibration → Calibrated RF/XGBoost (Platt/isotonic); temperature scaling for DNN
Guaranteed coverage → Conformal prediction (any model, any distribution)
Generative modeling → VAE (smooth latent space), normalizing flows (exact likelihood), diffusion models
Sequential inference → HMMs, Kalman filters, particle filters

Analyzing[edit]

Uncertainty Estimation Comparison
Method Type of Uncertainty Coverage Guarantee Computational Cost
Point estimate + softmax None (overconfident) None Very low
Temperature scaling Calibrated confidence Empirical only Very low
MC Dropout Epistemic (approx) None Low
Deep Ensembles Both (approx) None High
Conformal prediction Distribution-free sets Guaranteed (1-α) Low
Gaussian process Epistemic (exact for GP) Bayesian Very high

Failure modes: Overconfident point estimates causing unsafe decisions in high-stakes settings. Poor calibration — confidence scores don't match empirical frequencies. Distribution shift invalidating calibration. VAE posterior collapse — decoder ignores latent code. Conformal prediction requires exchangeable data — fails under distribution shift without adaptation.

Evaluating[edit]

Probabilistic model evaluation:

  1. Calibration: reliability diagrams, ECE (Expected Calibration Error) — lower is better.
  2. Sharpness: prediction sets should be as small as possible while maintaining coverage; a set containing all classes is valid but useless.
  3. NLL (Negative Log-Likelihood): proper scoring rule penalizing both inaccuracy and overconfidence.
  4. Coverage: for conformal prediction, empirically verify that guaranteed coverage holds.
  5. Entropy: high-entropy predictions on uncertain inputs, low-entropy on certain ones — the ideal pattern.

Creating[edit]

Designing a probabilistic prediction pipeline:

  1. Choose model type based on data size and uncertainty needs.
  2. Train base model; add conformal calibration on held-out calibration set.
  3. Set α based on acceptable error rate for the application (medical: α=0.01, recommendation: α=0.1).
  4. Produce prediction sets rather than point predictions; communicate uncertainty to downstream users.
  5. Monitor calibration in production: track ECE on new data; alert if calibration degrades.
  6. For distribution shift: use adaptive conformal prediction (ACI) which continuously updates the quantile threshold.