Probability Theory
How to read this page: This article maps the topic from beginner to expert across six levels � Remembering, Understanding, Applying, Analyzing, Evaluating, and Creating. Scan the headings to see the full scope, then read from wherever your knowledge starts to feel uncertain. Learn more about how BloomWiki works ?
Probability Theory is the branch of mathematics concerned with the analysis of random phenomena. While the outcome of a single event (like a coin flip) might be unpredictable, probability theory reveals the hidden patterns that emerge when an event is repeated many times. It is the mathematical framework for **Uncertainty**. From the insurance industry predicting risk to physicists calculating the position of an electron and AI models predicting the next word in a sentence, probability is the tool we use to navigate a world that is not deterministic.
Remembering
- Probability — A measure of the likelihood that an event will occur, ranging from 0 (impossible) to 1 (certain).
- Sample Space ($S$) — The set of all possible outcomes of an experiment.
- Event ($E$) — A subset of the sample space (e.g., "Rolling an even number").
- Independent Events — Events where the outcome of one does not affect the other (e.g., two coin flips).
- Dependent Events — Events where the outcome of one affects the likelihood of the other (e.g., drawing cards without replacement).
- Conditional Probability — The probability of an event occurring given that another event has already occurred ($P(A|B)$).
- Bayes' Theorem — A formula that describes how to update the probability of a hypothesis as more evidence becomes available.
- Random Variable — A variable whose value is determined by the outcome of a random experiment.
- Mean (Expected Value) — The "Average" outcome if an experiment is repeated infinitely.
- Variance / Standard Deviation — Measures of how "spread out" the outcomes are from the mean.
- Normal Distribution (Bell Curve) — A common probability distribution where most outcomes cluster around the center.
- Law of Large Numbers — The principle that as a sample size grows, its mean gets closer to the average of the whole population.
- Central Limit Theorem — The amazing fact that the sum of many independent random variables tends toward a normal distribution, regardless of the original distribution.
Understanding
Probability is understood through **Frequency** and **Belief**.
- 1. The Frequentist View**: Probability is what happens in the long run. If you flip a coin 1 million times, it will be heads 50% of the time. $P(Heads) = 0.5$.
- 2. The Bayesian View**: Probability is a "Degree of Belief." If I say there is a 70% chance of rain, I am expressing my confidence based on the current data. As soon as I see a dark cloud, I update my belief.
- 3. Distributions**:
- **Binomial**: Used for "Yes/No" outcomes (e.g., How many people will click this ad?).
- **Poisson**: Used for events happening over time (e.g., How many emails will I get in an hour?).
- **Normal**: Used for natural traits (e.g., Height, IQ, Measurement errors).
- The Gambler's Fallacy**: The mistaken belief that if something happens more frequently than normal during a given period, it will happen less frequently in the future (and vice versa). If you flip 5 heads in a row, the 6th flip is *still* 50/50. The "Universe" has no memory.
Applying
Modeling 'Bayesian Inference' (Spam Filtering): <syntaxhighlight lang="python"> def update_probability(prior_prob, prob_evidence_given_spam, prob_evidence_given_ham):
""" Bayes' Theorem: P(Spam|Word) = [P(Word|Spam) * P(Spam)] / P(Word) """ # P(Word) = P(Word|Spam)P(Spam) + P(Word|Ham)P(Ham) p_ham = 1 - prior_prob p_word = (prob_evidence_given_spam * prior_prob) + (prob_evidence_given_ham * p_ham) posterior_prob = (prob_evidence_given_spam * prior_prob) / p_word return posterior_prob
- Prior: 10% of emails are spam.
- 'Buy Now' appears in 80% of spam but only 1% of ham.
print(f"Prob it is spam if it says 'Buy Now': {update_probability(0.1, 0.8, 0.01):.2f}")
- This 'Learning' logic is how your email filter gets smarter.
</syntaxhighlight>
- Probability Paradoxes
- The Monty Hall Problem → Why you should always "switch doors" on a game show (it doubles your chances!).
- The Birthday Paradox → In a room of just 23 people, there is a 50% chance two share a birthday.
- Simpson's Paradox → A trend appears in several different groups of data but disappears or reverses when these groups are combined.
Analyzing
| Feature | Permutation | Combination |
|---|---|---|
| Order Matters? | Yes (ABC != CBA) | No (ABC == CBA) |
| Example | Entering a safe code | Picking a team of 3 people |
| Formula | $n! / (n-r)!$ | $n! / [r!(n-r)!]$ |
| Complexity | Higher (More possibilities) | Lower (Fewer possibilities) |
- The Concept of "Expectation"**: Insurance companies don't care about *one* person's accident; they care about the **Expected Value** of 1 million people. If they charge $100 and the average payout is $80, they are guaranteed to make a profit due to the **Law of Large Numbers**. Analyzing these "Expected Values" is how casinos stay in business.
Evaluating
Evaluating a probability claim: (1) **Sample Size**: Is the "100% success rate" based on 2 people or 2,000? (2) **Independence**: Are you assuming events are independent when they are actually linked (e.g., multiple stock market crashes)? (3) **Selection Bias**: Did you only look at the data that proved your point? (4) **Base Rate Fallacy**: If a test is 99% accurate for a disease that only 1 in 10,000 people have, a "Positive" result is actually more likely to be a false alarm.
Creating
Future Frontiers: (1) **Quantum Probability**: Dealing with the "True" randomness of the universe at the subatomic level. (2) **Algorithmic Information Theory**: Using probability to define how much "Information" or "Complexity" is in a piece of data. (3) **Stochastic AI**: Building neural networks that don't just give one answer, but a "Probability Distribution" of possible answers. (4) **The End of Randomness**: The theoretical debate over whether "Randomness" actually exists, or if we just lack the data to predict it perfectly.