Causal Inference: Difference between revisions
BloomWiki: Causal Inference Tag: Manual revert |
BloomWiki: Causal Inference |
||
| Line 1: | Line 1: | ||
{{BloomIntro}} | {{BloomIntro}} | ||
Causal | Causal Inference is the process of drawing a conclusion about a causal connection based on the conditions of the occurrence of an effect. While traditional statistics is often summarized by the phrase "Correlation is not Causation," Causal Inference is the science of determining ''when'' and ''how'' we can conclude that one thing actually causes another. This field is essential for policy-making, medicine, and AI, as we need to know not just that two things happen together (e.g., ice cream sales and shark attacks), but if changing one will change the other (e.g., if we ban ice cream, will shark attacks decrease?). | ||
== Remembering == | == Remembering == | ||
* ''' | * '''Causal Inference''' — The branch of statistics concerned with identifying cause-and-effect relationships. | ||
* ''' | * '''Counterfactual''' — The "What if?" scenario; what would have happened if a different action had been taken. | ||
* ''' | * '''Confounder''' — A variable that influences both the cause and the effect, creating a "spurious" correlation (e.g., 'Heat' causes both ice cream sales and shark attacks). | ||
* ''' | * '''Randomized Controlled Trial (RCT)''' — The "Gold Standard" of causal inference, where participants are randomly assigned to groups to eliminate confounders. | ||
* '''Observational | * '''Observational Study''' — A study where the researcher does not control the assignment of treatment (common in economics and sociology). | ||
* '''Selection Bias''' — When the people who choose a treatment are different from those who don't (e.g., people who take vitamins are already more health-conscious). | |||
* ''' | * '''Instrumental Variable (IV)''' — A variable that affects the treatment but has no direct effect on the outcome, used to "isolate" a causal effect in observational data. | ||
* ''' | * '''Propensity Score Matching''' — A technique that attempts to estimate the effect of a treatment by accounting for the covariates that predict receiving the treatment. | ||
* ''' | * '''Directed Acyclic Graph (DAG)''' — A visual map of causal relationships (nodes and arrows). | ||
* ''' | * '''Do-calculus''' — A mathematical framework developed by Judea Pearl for intervening in a causal system. | ||
* '''Average Treatment Effect (ATE)''' — The average difference in outcomes between the treated and untreated groups. | |||
* '''Do-calculus''' — A | * '''Natural Experiment''' — An empirical study where individuals are exposed to the experimental and control conditions as determined by nature or other factors outside the control of the investigators (e.g., a change in law in one state but not another). | ||
* ''' | |||
* ''' | |||
== Understanding == | == Understanding == | ||
Causal inference is the quest for the '''Counterfactual'''. | |||
'''The Fundamental Problem of Causal Inference''': You can never observe the same person in two different states at the same time. You either took the pill or you didn't. We can never ''know'' for sure what would have happened if you hadn't taken it. Therefore, we have to find clever ways to "simulate" the counterfactual. | |||
'''Judea Pearl's Ladder of Causation''': | |||
1. '''Association''' (Seeing): "If I see X, how likely is Y?" (Standard Machine Learning). | |||
2. '''Intervention''' (Doing): "If I ''do'' X, what will happen to Y?" (Causal Inference). | |||
3. '''Counterfactuals''' (Imagining): "If I had done X instead of Z, what ''would have'' happened?" (The highest form of human/AI reasoning). | |||
''' | '''The Back-Door Criterion''': If you want to know if X causes Y, you must "close the back door"—meaning you must control for all the variables that might be causing both X and Y. If you don't, your result will be "biased" by the '''Confounder'''. | ||
== Applying == | == Applying == | ||
''' | '''Simulating a Confounder (Spurious Correlation):''' | ||
<syntaxhighlight lang="python"> | <syntaxhighlight lang="python"> | ||
import numpy as np | import numpy as np | ||
def simulate_spurious_correlation(n_samples): | |||
""" | |||
Shows how 'Heat' causes both Ice Cream and Shark Attacks. | |||
Without knowing about 'Heat', we might think Ice Cream | |||
causes Shark Attacks. | |||
""" | |||
# The Confounder (The true cause) | |||
heat = np.random.normal(25, 5, n_samples) | |||
# Effects | |||
ice_cream = 2 * heat + np.random.normal(0, 2, n_samples) | |||
sharks = 0.5 * heat + np.random.normal(0, 1, n_samples) | |||
# Correlation between Ice Cream and Sharks | |||
correlation = np.corrcoef(ice_cream, sharks)[0, 1] | |||
return correlation | |||
print(f"Correlation between Ice Cream and Sharks: {simulate_spurious_correlation(1000):.3f}") | |||
# This is a 'Spurious' correlation. Controlling for 'Heat' | |||
# would bring this correlation to near zero. | |||
print(f" | |||
# | |||
</syntaxhighlight> | </syntaxhighlight> | ||
; Causal | ; Causal Tools in Action | ||
: ''' | : '''A/B Testing''' → Using RCTs to see if a specific website change "causes" more sales. | ||
: ''' | : '''Difference-in-Differences (Diff-in-Diff)''' → Comparing a "Treatment" group (e.g., a state that raised the minimum wage) to a "Control" group (a neighbor state that didn't). | ||
: '''Regression Discontinuity''' → Comparing people just above and just below a cutoff (e.g., students who just barely passed an exam vs. those who just barely failed). | |||
: '''Mediation Analysis''' → Exploring the "mechanism"—does X cause Y directly, or does X cause M, which then causes Y? | |||
: ''' | |||
: ''' | |||
== Analyzing == | == Analyzing == | ||
{| class="wikitable" | {| class="wikitable" | ||
|+ | |+ Correlation vs. Causation | ||
! | ! Feature !! Correlation !! Causation | ||
|- | |- | ||
| | | Symmetry || Symmetric (If A correlates with B, B correlates with A) || Asymmetric (A causes B, but B doesn't necessarily cause A) | ||
|- | |- | ||
| | | Prediction || Good for "What usually happens?" || Good for "What happens if I change things?" | ||
|- | |- | ||
| | | Math || Covariance, Pearson's r || Do-calculus, DAGs, Structural Equations | ||
|- | |- | ||
| | | Requirement || Observation || Intervention or clever identification | ||
|} | |} | ||
''' | '''Collider Bias''': This is a tricky trap. If you control for a variable that is caused by ''both'' your treatment and your outcome, you can accidentally create a fake correlation where none existed. For example, if you only study "Famous Actors," you might find that "Acting Talent" and "Physical Beauty" are negatively correlated—not because they are in real life, but because you need ''one'' of them to be famous in the first place. | ||
== Evaluating == | == Evaluating == | ||
Evaluating a causal claim: | |||
# '''Exogeneity''': Was the treatment really assigned randomly (or "as-if" randomly)? | |||
''' | # '''SUTVA''': Does one person's treatment affect another person's outcome (spillover)? | ||
# '''Internal Validity''': Is the causal effect true for the group studied? | |||
''' | # '''External Validity (Transportability)''': Will this causal effect work in a different city or a different decade? | ||
''' | |||
''' | |||
== Creating == | == Creating == | ||
Future Frontiers: | |||
# '''Causal AI''': Moving beyond "Pattern Recognition" (Large Language Models) to "Causal Reasoning" (systems that can answer 'Why?' and 'What if?'). | |||
''' | # '''Synthetic Controls''': Using AI to create a "perfect" simulated control group for situations where no real control exists. | ||
# '''Causal Discovery''': Algorithms that can look at a dataset and "infer" the DAG (the map of arrows) automatically. | |||
# '''Precision Policy''': Using causal models to predict which specific individual will benefit from a specific intervention (Heterogeneous Treatment Effects). | |||
''' | |||
''' | |||
[[Category:Statistics]] | [[Category:Statistics]] | ||
[[Category:Science]] | |||
[[Category:Economics]] | |||
Revision as of 14:37, 23 April 2026
How to read this page: This article maps the topic from beginner to expert across six levels � Remembering, Understanding, Applying, Analyzing, Evaluating, and Creating. Scan the headings to see the full scope, then read from wherever your knowledge starts to feel uncertain. Learn more about how BloomWiki works ?
Causal Inference is the process of drawing a conclusion about a causal connection based on the conditions of the occurrence of an effect. While traditional statistics is often summarized by the phrase "Correlation is not Causation," Causal Inference is the science of determining when and how we can conclude that one thing actually causes another. This field is essential for policy-making, medicine, and AI, as we need to know not just that two things happen together (e.g., ice cream sales and shark attacks), but if changing one will change the other (e.g., if we ban ice cream, will shark attacks decrease?).
Remembering
- Causal Inference — The branch of statistics concerned with identifying cause-and-effect relationships.
- Counterfactual — The "What if?" scenario; what would have happened if a different action had been taken.
- Confounder — A variable that influences both the cause and the effect, creating a "spurious" correlation (e.g., 'Heat' causes both ice cream sales and shark attacks).
- Randomized Controlled Trial (RCT) — The "Gold Standard" of causal inference, where participants are randomly assigned to groups to eliminate confounders.
- Observational Study — A study where the researcher does not control the assignment of treatment (common in economics and sociology).
- Selection Bias — When the people who choose a treatment are different from those who don't (e.g., people who take vitamins are already more health-conscious).
- Instrumental Variable (IV) — A variable that affects the treatment but has no direct effect on the outcome, used to "isolate" a causal effect in observational data.
- Propensity Score Matching — A technique that attempts to estimate the effect of a treatment by accounting for the covariates that predict receiving the treatment.
- Directed Acyclic Graph (DAG) — A visual map of causal relationships (nodes and arrows).
- Do-calculus — A mathematical framework developed by Judea Pearl for intervening in a causal system.
- Average Treatment Effect (ATE) — The average difference in outcomes between the treated and untreated groups.
- Natural Experiment — An empirical study where individuals are exposed to the experimental and control conditions as determined by nature or other factors outside the control of the investigators (e.g., a change in law in one state but not another).
Understanding
Causal inference is the quest for the Counterfactual.
The Fundamental Problem of Causal Inference: You can never observe the same person in two different states at the same time. You either took the pill or you didn't. We can never know for sure what would have happened if you hadn't taken it. Therefore, we have to find clever ways to "simulate" the counterfactual.
Judea Pearl's Ladder of Causation: 1. Association (Seeing): "If I see X, how likely is Y?" (Standard Machine Learning). 2. Intervention (Doing): "If I do X, what will happen to Y?" (Causal Inference). 3. Counterfactuals (Imagining): "If I had done X instead of Z, what would have happened?" (The highest form of human/AI reasoning).
The Back-Door Criterion: If you want to know if X causes Y, you must "close the back door"—meaning you must control for all the variables that might be causing both X and Y. If you don't, your result will be "biased" by the Confounder.
Applying
Simulating a Confounder (Spurious Correlation): <syntaxhighlight lang="python"> import numpy as np
def simulate_spurious_correlation(n_samples):
""" Shows how 'Heat' causes both Ice Cream and Shark Attacks. Without knowing about 'Heat', we might think Ice Cream causes Shark Attacks. """ # The Confounder (The true cause) heat = np.random.normal(25, 5, n_samples) # Effects ice_cream = 2 * heat + np.random.normal(0, 2, n_samples) sharks = 0.5 * heat + np.random.normal(0, 1, n_samples) # Correlation between Ice Cream and Sharks correlation = np.corrcoef(ice_cream, sharks)[0, 1] return correlation
print(f"Correlation between Ice Cream and Sharks: {simulate_spurious_correlation(1000):.3f}")
- This is a 'Spurious' correlation. Controlling for 'Heat'
- would bring this correlation to near zero.
</syntaxhighlight>
- Causal Tools in Action
- A/B Testing → Using RCTs to see if a specific website change "causes" more sales.
- Difference-in-Differences (Diff-in-Diff) → Comparing a "Treatment" group (e.g., a state that raised the minimum wage) to a "Control" group (a neighbor state that didn't).
- Regression Discontinuity → Comparing people just above and just below a cutoff (e.g., students who just barely passed an exam vs. those who just barely failed).
- Mediation Analysis → Exploring the "mechanism"—does X cause Y directly, or does X cause M, which then causes Y?
Analyzing
| Feature | Correlation | Causation |
|---|---|---|
| Symmetry | Symmetric (If A correlates with B, B correlates with A) | Asymmetric (A causes B, but B doesn't necessarily cause A) |
| Prediction | Good for "What usually happens?" | Good for "What happens if I change things?" |
| Math | Covariance, Pearson's r | Do-calculus, DAGs, Structural Equations |
| Requirement | Observation | Intervention or clever identification |
Collider Bias: This is a tricky trap. If you control for a variable that is caused by both your treatment and your outcome, you can accidentally create a fake correlation where none existed. For example, if you only study "Famous Actors," you might find that "Acting Talent" and "Physical Beauty" are negatively correlated—not because they are in real life, but because you need one of them to be famous in the first place.
Evaluating
Evaluating a causal claim:
- Exogeneity: Was the treatment really assigned randomly (or "as-if" randomly)?
- SUTVA: Does one person's treatment affect another person's outcome (spillover)?
- Internal Validity: Is the causal effect true for the group studied?
- External Validity (Transportability): Will this causal effect work in a different city or a different decade?
Creating
Future Frontiers:
- Causal AI: Moving beyond "Pattern Recognition" (Large Language Models) to "Causal Reasoning" (systems that can answer 'Why?' and 'What if?').
- Synthetic Controls: Using AI to create a "perfect" simulated control group for situations where no real control exists.
- Causal Discovery: Algorithms that can look at a dataset and "infer" the DAG (the map of arrows) automatically.
- Precision Policy: Using causal models to predict which specific individual will benefit from a specific intervention (Heterogeneous Treatment Effects).