Bayesian Statistics
How to read this page: This article maps the topic from beginner to expert across six levels � Remembering, Understanding, Applying, Analyzing, Evaluating, and Creating. Scan the headings to see the full scope, then read from wherever your knowledge starts to feel uncertain. Learn more about how BloomWiki works ?
Bayesian Statistics is a theory in the field of statistics based on the Bayesian interpretation of probability, where probability expresses a "degree of belief" in an event. This degree of belief may be based on prior knowledge about the event, such as the results of previous experiments, or on personal beliefs about the event. This differs from "Frequentist" statistics, which views probability as the long-term frequency of a random event. Bayesian statistics allows us to update our beliefs as new evidence comes in, making it the mathematical foundation for modern AI, medical diagnosis, and scientific forecasting.
Remembering
- Bayesian Statistics — A method of statistical inference in which Bayes' theorem is used to update the probability for a hypothesis as more evidence or information becomes available.
- Prior Probability (Prior) — The initial belief about the probability of an event before new evidence is seen.
- Likelihood — The probability of the evidence given that the hypothesis is true.
- Posterior Probability (Posterior) — The updated belief about the probability of an event after seeing new evidence.
- Bayes' Theorem — The mathematical formula for updating probabilities: $P(H|E) = [P(E|H) * P(H)] / P(E)$.
- Frequentist Statistics — A framework where probability is defined as the long-run limit of a relative frequency.
- Null Hypothesis — (In Frequentist stats) The default assumption that there is no relationship or effect.
- Credible Interval — (Bayesian version of a Confidence Interval) The range within which an unobserved parameter value falls with a particular probability.
- MCMC (Markov Chain Monte Carlo) — A class of algorithms used to sample from a probability distribution to find the posterior.
- P-value — (Frequentist) The probability of observing results at least as extreme as those measured, assuming the null hypothesis is true.
- Subjective Probability — Probability as a measure of an individual's personal belief.
- Conditional Probability — The probability of an event occurring given that another event has already occurred.
- Sensitivity — The probability that a test correctly identifies a positive result.
- Specificity — The probability that a test correctly identifies a negative result.
Understanding
The core of Bayesian thinking is the Update.
Frequentist vs. Bayesian:
- Frequentist: If I flip a coin 10 times and get 7 heads, a frequentist might say the probability of heads is 70%.
- Bayesian: I *know* coins are usually fair (My Prior). Even if I see 7 heads, I don't immediately believe the coin is broken. I update my belief slightly. Only if I see 1,000 heads do I abandon my prior and accept the coin is unfair.
Bayes' Theorem in Plain English: The probability that your hypothesis is true given the evidence depends on: 1. How likely the evidence is if the hypothesis is true (Likelihood). 2. How likely the hypothesis was to begin with (Prior). 3. Divided by how likely the evidence is overall (Normalization).
The Base Rate Fallacy: This is a common cognitive bias that Bayesian math corrects. If a rare disease affects 1 in 10,000 people, and a test is 99% accurate, a positive test result doesn't mean you have a 99% chance of being sick. Because the disease is so rare (Low Prior), you actually only have about a 1% chance. Most people ignore the "Base Rate" and panic; Bayesians do the math.
Applying
Calculating Posterior Probability (Medical Test): <syntaxhighlight lang="python"> def bayes_update(prior, sensitivity, specificity):
""" prior: P(Disease) sensitivity: P(+ | Disease) specificity: P(- | No Disease) """ # P(Positive Test | No Disease) false_positive_rate = 1 - specificity # P(Evidence) = P(E|H)P(H) + P(E|~H)P(~H) p_evidence = (sensitivity * prior) + (false_positive_rate * (1 - prior)) # Posterior P(H|E) posterior = (sensitivity * prior) / p_evidence return posterior
- Rare disease: 0.1% of population
- Test: 95% sensitive, 95% specific
prior_prob = 0.001 result = bayes_update(prior_prob, 0.95, 0.95)
print(f"Prior Probability: {prior_prob*100:.2f}%") print(f"Posterior Probability after Positive Test: {result*100:.2f}%")
- Even with a '95% accurate' test, you only have ~1.8% chance of being sick!
</syntaxhighlight>
- Bayesian Applications
- Spam Filters → Updating the probability that an email is "Spam" based on the presence of words like "Free" or "Viagra."
- Self-Driving Cars → Constantly updating the "Belief State" of where other cars are based on noisy sensor data (Kalman Filters).
- Medical Diagnosis → Doctors implicitly use Bayesian reasoning when they consider a patient's symptoms alongside their age and history.
- Scientific Discovery → Using "Bayesian Search Theory" to find missing ships or planes (like the search for MH370).
Analyzing
| Feature | Frequentist | Bayesian |
|---|---|---|
| Data | Random Sample | Observed Fixed Evidence |
| Parameters | Fixed Unknowns | Random Variables (Distributions) |
| Prior Knowledge | Ignored | Explicitly Included |
| Conclusion | p-value (reject/fail to reject) | Posterior Probability Distribution |
| Main Tool | Maximum Likelihood Estimation | Bayes' Theorem / MCMC |
The Problem of the Prior: The biggest criticism of Bayesian stats is that the "Prior" is subjective. If two people start with different priors, they will get different results from the same data. Bayesians argue that as more data comes in, the "Likelihood" eventually overwhelms the "Prior," and both people will converge on the truth. This is called Posterior Convergence.
Evaluating
Evaluating a Bayesian model: (1) Sensitivity to Prior: Does the result change drastically if we slightly change our initial belief? (2) Computational Cost: Is the model so complex that MCMC takes days to run? (3) Predictive Accuracy: Does the model accurately predict "out-of-sample" data? (4) Convergence: Has the Markov Chain "mixed" well, or is it stuck in a local area of the probability space?
Creating
Future Frontiers: (1) Bayesian Neural Networks: AI that doesn't just give an answer, but tells you *how certain* it is (Quantifying Uncertainty). (2) Probabilistic Programming: Languages like Pyro or Stan that allow programmers to build complex Bayesian models with a few lines of code. (3) Bayesian Brain Theory: The controversial idea that the human brain itself is a "Bayesian Prediction Machine" that is constantly updating its model of the world. (4) Automated Scientific Discovery: Using Bayesian Optimization to choose the next experiment that will provide the most information.