Causal Inference: Difference between revisions

From BloomWiki
Jump to navigation Jump to search
BloomWiki: Causal Inference
BloomWiki: Causal Inference
 
Line 1: Line 1:
<div style="background-color: #4B0082; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
{{BloomIntro}}
{{BloomIntro}}
Causal Inference is the process of drawing a conclusion about a causal connection based on the conditions of the occurrence of an effect. While traditional statistics is often summarized by the phrase "Correlation is not Causation," Causal Inference is the science of determining ''when'' and ''how'' we can conclude that one thing actually causes another. This field is essential for policy-making, medicine, and AI, as we need to know not just that two things happen together (e.g., ice cream sales and shark attacks), but if changing one will change the other (e.g., if we ban ice cream, will shark attacks decrease?).
Causal Inference is the process of drawing a conclusion about a causal connection based on the conditions of the occurrence of an effect. While traditional statistics is often summarized by the phrase "Correlation is not Causation," Causal Inference is the science of determining ''when'' and ''how'' we can conclude that one thing actually causes another. This field is essential for policy-making, medicine, and AI, as we need to know not just that two things happen together (e.g., ice cream sales and shark attacks), but if changing one will change the other (e.g., if we ban ice cream, will shark attacks decrease?).
</div>


== Remembering ==
__TOC__
 
<div style="background-color: #000080; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
== <span style="color: #FFFFFF;">Remembering</span> ==
* '''Causal Inference''' — The branch of statistics concerned with identifying cause-and-effect relationships.
* '''Causal Inference''' — The branch of statistics concerned with identifying cause-and-effect relationships.
* '''Counterfactual''' — The "What if?" scenario; what would have happened if a different action had been taken.
* '''Counterfactual''' — The "What if?" scenario; what would have happened if a different action had been taken.
Line 15: Line 20:
* '''Average Treatment Effect (ATE)''' — The average difference in outcomes between the treated and untreated groups.
* '''Average Treatment Effect (ATE)''' — The average difference in outcomes between the treated and untreated groups.
* '''Natural Experiment''' — An empirical study where individuals are exposed to the experimental and control conditions as determined by nature or other factors outside the control of the investigators (e.g., a change in law in one state but not another).
* '''Natural Experiment''' — An empirical study where individuals are exposed to the experimental and control conditions as determined by nature or other factors outside the control of the investigators (e.g., a change in law in one state but not another).
</div>


== Understanding ==
<div style="background-color: #006400; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
== <span style="color: #FFFFFF;">Understanding</span> ==
Causal inference is the quest for the '''Counterfactual'''.
Causal inference is the quest for the '''Counterfactual'''.


Line 27: Line 34:


'''The Back-Door Criterion''': If you want to know if X causes Y, you must "close the back door"—meaning you must control for all the variables that might be causing both X and Y. If you don't, your result will be "biased" by the '''Confounder'''.
'''The Back-Door Criterion''': If you want to know if X causes Y, you must "close the back door"—meaning you must control for all the variables that might be causing both X and Y. If you don't, your result will be "biased" by the '''Confounder'''.
</div>


== Applying ==
<div style="background-color: #8B0000; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
== <span style="color: #FFFFFF;">Applying</span> ==
'''Simulating a Confounder (Spurious Correlation):'''
'''Simulating a Confounder (Spurious Correlation):'''
<syntaxhighlight lang="python">
<syntaxhighlight lang="python">
Line 61: Line 70:
: '''Regression Discontinuity''' → Comparing people just above and just below a cutoff (e.g., students who just barely passed an exam vs. those who just barely failed).
: '''Regression Discontinuity''' → Comparing people just above and just below a cutoff (e.g., students who just barely passed an exam vs. those who just barely failed).
: '''Mediation Analysis''' → Exploring the "mechanism"—does X cause Y directly, or does X cause M, which then causes Y?
: '''Mediation Analysis''' → Exploring the "mechanism"—does X cause Y directly, or does X cause M, which then causes Y?
</div>


== Analyzing ==
<div style="background-color: #8B4500; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
== <span style="color: #FFFFFF;">Analyzing</span> ==
{| class="wikitable"
{| class="wikitable"
|+ Correlation vs. Causation
|+ Correlation vs. Causation
Line 77: Line 88:


'''Collider Bias''': This is a tricky trap. If you control for a variable that is caused by ''both'' your treatment and your outcome, you can accidentally create a fake correlation where none existed. For example, if you only study "Famous Actors," you might find that "Acting Talent" and "Physical Beauty" are negatively correlated—not because they are in real life, but because you need ''one'' of them to be famous in the first place.
'''Collider Bias''': This is a tricky trap. If you control for a variable that is caused by ''both'' your treatment and your outcome, you can accidentally create a fake correlation where none existed. For example, if you only study "Famous Actors," you might find that "Acting Talent" and "Physical Beauty" are negatively correlated—not because they are in real life, but because you need ''one'' of them to be famous in the first place.
</div>


== Evaluating ==
<div style="background-color: #483D8B; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
== <span style="color: #FFFFFF;">Evaluating</span> ==
Evaluating a causal claim:
Evaluating a causal claim:
# '''Exogeneity''': Was the treatment really assigned randomly (or "as-if" randomly)?
# '''Exogeneity''': Was the treatment really assigned randomly (or "as-if" randomly)?
Line 84: Line 97:
# '''Internal Validity''': Is the causal effect true for the group studied?
# '''Internal Validity''': Is the causal effect true for the group studied?
# '''External Validity (Transportability)''': Will this causal effect work in a different city or a different decade?
# '''External Validity (Transportability)''': Will this causal effect work in a different city or a different decade?
</div>


== Creating ==
<div style="background-color: #2F4F4F; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
== <span style="color: #FFFFFF;">Creating</span> ==
Future Frontiers:
Future Frontiers:
# '''Causal AI''': Moving beyond "Pattern Recognition" (Large Language Models) to "Causal Reasoning" (systems that can answer 'Why?' and 'What if?').
# '''Causal AI''': Moving beyond "Pattern Recognition" (Large Language Models) to "Causal Reasoning" (systems that can answer 'Why?' and 'What if?').
Line 95: Line 110:
[[Category:Science]]
[[Category:Science]]
[[Category:Economics]]
[[Category:Economics]]
</div>

Latest revision as of 01:48, 25 April 2026

How to read this page: This article maps the topic from beginner to expert across six levels � Remembering, Understanding, Applying, Analyzing, Evaluating, and Creating. Scan the headings to see the full scope, then read from wherever your knowledge starts to feel uncertain. Learn more about how BloomWiki works ?

Causal Inference is the process of drawing a conclusion about a causal connection based on the conditions of the occurrence of an effect. While traditional statistics is often summarized by the phrase "Correlation is not Causation," Causal Inference is the science of determining when and how we can conclude that one thing actually causes another. This field is essential for policy-making, medicine, and AI, as we need to know not just that two things happen together (e.g., ice cream sales and shark attacks), but if changing one will change the other (e.g., if we ban ice cream, will shark attacks decrease?).

Remembering[edit]

  • Causal Inference — The branch of statistics concerned with identifying cause-and-effect relationships.
  • Counterfactual — The "What if?" scenario; what would have happened if a different action had been taken.
  • Confounder — A variable that influences both the cause and the effect, creating a "spurious" correlation (e.g., 'Heat' causes both ice cream sales and shark attacks).
  • Randomized Controlled Trial (RCT) — The "Gold Standard" of causal inference, where participants are randomly assigned to groups to eliminate confounders.
  • Observational Study — A study where the researcher does not control the assignment of treatment (common in economics and sociology).
  • Selection Bias — When the people who choose a treatment are different from those who don't (e.g., people who take vitamins are already more health-conscious).
  • Instrumental Variable (IV) — A variable that affects the treatment but has no direct effect on the outcome, used to "isolate" a causal effect in observational data.
  • Propensity Score Matching — A technique that attempts to estimate the effect of a treatment by accounting for the covariates that predict receiving the treatment.
  • Directed Acyclic Graph (DAG) — A visual map of causal relationships (nodes and arrows).
  • Do-calculus — A mathematical framework developed by Judea Pearl for intervening in a causal system.
  • Average Treatment Effect (ATE) — The average difference in outcomes between the treated and untreated groups.
  • Natural Experiment — An empirical study where individuals are exposed to the experimental and control conditions as determined by nature or other factors outside the control of the investigators (e.g., a change in law in one state but not another).

Understanding[edit]

Causal inference is the quest for the Counterfactual.

The Fundamental Problem of Causal Inference: You can never observe the same person in two different states at the same time. You either took the pill or you didn't. We can never know for sure what would have happened if you hadn't taken it. Therefore, we have to find clever ways to "simulate" the counterfactual.

Judea Pearl's Ladder of Causation: 1. Association (Seeing): "If I see X, how likely is Y?" (Standard Machine Learning). 2. Intervention (Doing): "If I do X, what will happen to Y?" (Causal Inference). 3. Counterfactuals (Imagining): "If I had done X instead of Z, what would have happened?" (The highest form of human/AI reasoning).

The Back-Door Criterion: If you want to know if X causes Y, you must "close the back door"—meaning you must control for all the variables that might be causing both X and Y. If you don't, your result will be "biased" by the Confounder.

Applying[edit]

Simulating a Confounder (Spurious Correlation): <syntaxhighlight lang="python"> import numpy as np

def simulate_spurious_correlation(n_samples):

   """
   Shows how 'Heat' causes both Ice Cream and Shark Attacks.
   Without knowing about 'Heat', we might think Ice Cream 
   causes Shark Attacks.
   """
   # The Confounder (The true cause)
   heat = np.random.normal(25, 5, n_samples)
   
   # Effects
   ice_cream = 2 * heat + np.random.normal(0, 2, n_samples)
   sharks = 0.5 * heat + np.random.normal(0, 1, n_samples)
   
   # Correlation between Ice Cream and Sharks
   correlation = np.corrcoef(ice_cream, sharks)[0, 1]
   
   return correlation

print(f"Correlation between Ice Cream and Sharks: {simulate_spurious_correlation(1000):.3f}")

  1. This is a 'Spurious' correlation. Controlling for 'Heat'
  2. would bring this correlation to near zero.

</syntaxhighlight>

Causal Tools in Action
A/B Testing → Using RCTs to see if a specific website change "causes" more sales.
Difference-in-Differences (Diff-in-Diff) → Comparing a "Treatment" group (e.g., a state that raised the minimum wage) to a "Control" group (a neighbor state that didn't).
Regression Discontinuity → Comparing people just above and just below a cutoff (e.g., students who just barely passed an exam vs. those who just barely failed).
Mediation Analysis → Exploring the "mechanism"—does X cause Y directly, or does X cause M, which then causes Y?

Analyzing[edit]

Correlation vs. Causation
Feature Correlation Causation
Symmetry Symmetric (If A correlates with B, B correlates with A) Asymmetric (A causes B, but B doesn't necessarily cause A)
Prediction Good for "What usually happens?" Good for "What happens if I change things?"
Math Covariance, Pearson's r Do-calculus, DAGs, Structural Equations
Requirement Observation Intervention or clever identification

Collider Bias: This is a tricky trap. If you control for a variable that is caused by both your treatment and your outcome, you can accidentally create a fake correlation where none existed. For example, if you only study "Famous Actors," you might find that "Acting Talent" and "Physical Beauty" are negatively correlated—not because they are in real life, but because you need one of them to be famous in the first place.

Evaluating[edit]

Evaluating a causal claim:

  1. Exogeneity: Was the treatment really assigned randomly (or "as-if" randomly)?
  2. SUTVA: Does one person's treatment affect another person's outcome (spillover)?
  3. Internal Validity: Is the causal effect true for the group studied?
  4. External Validity (Transportability): Will this causal effect work in a different city or a different decade?

Creating[edit]

Future Frontiers:

  1. Causal AI: Moving beyond "Pattern Recognition" (Large Language Models) to "Causal Reasoning" (systems that can answer 'Why?' and 'What if?').
  2. Synthetic Controls: Using AI to create a "perfect" simulated control group for situations where no real control exists.
  3. Causal Discovery: Algorithms that can look at a dataset and "infer" the DAG (the map of arrows) automatically.
  4. Precision Policy: Using causal models to predict which specific individual will benefit from a specific intervention (Heterogeneous Treatment Effects).