Correlation vs Causation: Why Ice Cream Does Not Cause Drowning
Ice cream sales correlate with drowning deaths. Strongly. As ice cream sales rise, more people drown.
Does ice cream cause drowning?
Obviously not. Both are caused by hot weather. When it's hot, people buy more ice cream and go swimming more. That's a confounding variable.
This example is trivial. But correlation-causation confusion kills people.
- Hormone replacement therapy correlated with better heart health in observational studies. Doctors prescribed it for decades. Randomized trials later showed it increased heart disease risk. The correlation was confounded by socioeconomic status—wealthier women got HRT and also had better health habits.
- Vitamin E supplements correlated with lower cancer rates. Trials showed they had no effect (or possibly increased mortality). The correlation was confounded—people who take supplements are already health-conscious.
Correlation tells you X and Y move together. Causation tells you X makes Y happen. Confusing them isn't just sloppy thinking. It leads to bad policy, wasted money, and preventable deaths.
This article explains why correlation doesn't imply causation, how confounding works, and what tools actually can establish causation.
What Correlation Is (and Isn't)
Correlation coefficient ($r$): Measures how much two variables move together.
- $r = 1$: Perfect positive correlation. When X increases, Y always increases proportionally.
- $r = 0$: No linear correlation. X and Y are independent (or have a nonlinear relationship).
- $r = -1$: Perfect negative correlation. When X increases, Y always decreases.
Critical: $r$ measures linear association. Variables can be perfectly related but have $r = 0$ if the relationship is nonlinear (e.g., Y = X²).
Example:
- Height and weight: $r \approx 0.7$. Strongly correlated.
- Shoe size and IQ: $r \approx 0$. Uncorrelated.
But correlation says nothing about causation. Height and weight correlate because they're both caused by genetics, nutrition, age. Height doesn't cause weight. Weight doesn't cause height.
Three Reasons Correlation Isn't Causation
1. Confounding: A Third Variable Causes Both
X and Y correlate not because one causes the other, but because Z causes both.
Examples:
- Coffee and lung cancer correlated in early studies. Does coffee cause cancer? No. Smokers drink more coffee. Smoking causes cancer. Coffee is confounded with smoking.
- Education and income correlate. Does education cause higher income? Partially. But intelligence, family wealth, and social networks also cause both more education and higher income. The correlation overstates education's causal effect.
The problem: Observational data always has potential confounders. You can't control for what you don't measure. And you can't measure everything.
2. Reverse Causation: Y Causes X, Not X Causes Y
You observe X and Y correlate. You assume X → Y. But maybe Y → X.
Examples:
- Police presence and crime correlate positively. More police, more crime. Does police cause crime? No. Crime causes police deployment. High-crime areas get more police.
- Hospital beds and deaths correlate. More hospital beds, more deaths. Do hospitals kill people? No. Sick people go to hospitals. Illness causes both hospitalization and death.
The problem: Correlation is symmetric. If X correlates with Y, then Y correlates with X. The data alone doesn't tell you the direction of causation.
3. Spurious Correlation: Pure Coincidence
Two variables correlate by chance, with no causal relationship whatsoever.
Tyler Vigen's Spurious Correlations database is full of these:
- US spending on science correlates with suicides by hanging ($r = 0.998$).
- Per capita cheese consumption correlates with people who died tangled in bedsheets ($r = 0.947$).
- Nicolas Cage films correlate with swimming pool drownings ($r = 0.666$).
These are statistically significant (p < 0.05). They're also meaningless. With enough variables, you'll find correlations by chance.
The problem: If you test 1,000 pairs of variables, you'll find ~50 with $r > 0.5$ just by luck. Data mining guarantees spurious findings.
Causal Inference: What Actually Works
If correlation doesn't prove causation, what does?
1. Randomized Controlled Trials (RCTs): The Gold Standard
Randomly assign people to treatment vs. control. Randomization ensures the groups are identical on average—including for unmeasured confounders.
Then measure outcomes. If the treatment group differs, it's because of the treatment.
Example:
- Drug trial. Randomly assign 500 people to drug, 500 to placebo. Measure outcomes. If drug group improves more, the drug works.
Why it works: Randomization breaks confounding. Every confounder (measured or unmeasured) is balanced across groups. The only systematic difference is the treatment.
Limitations:
- Expensive and slow.
- Unethical for many questions (can't randomly assign smoking to test if it causes cancer).
- External validity concerns (trial participants ≠ real-world populations).
2. Natural Experiments: Quasi-Random Assignment
Sometimes reality creates quasi-random variation. You can exploit it.
Examples:
- Vietnam draft lottery. Draft numbers assigned randomly. Compare outcomes for drafted vs. not drafted. Causally identifies effect of military service on earnings, health, etc.
- Regression discontinuity. A program has a cutoff (e.g., scholarships for students scoring >1200 on SAT). Compare students just above vs. just below the cutoff. They're nearly identical except for program access.
- Difference-in-differences. A policy changes in one state but not another. Compare the change in outcomes before vs. after, treatment vs. control state.
Why it works: The variation in treatment is "as-if random." Confounders don't differ systematically across treatment and control.
Limitations: Requires specific circumstances. Not always available.
3. Instrumental Variables (IV): Exploiting Exogenous Shocks
Find a variable (the instrument) that:
- Affects the treatment (X).
- Doesn't directly affect the outcome (Y), except through X.
Use the instrument to isolate the causal effect of X on Y.
Example:
- Effect of education on income. Education is confounded (smarter people get more education and earn more).
- Instrument: Distance to nearest college. Distance affects education (closer = more schooling) but doesn't directly affect income.
- Use distance as IV to estimate causal effect of education.
Why it works: The instrument provides variation in X that's unrelated to confounders. You're using only the "clean" part of X's variation.
Limitations:
- Finding valid instruments is hard.
- IV estimates can be imprecise (wide confidence intervals).
- Violations of assumptions (instrument affects Y directly) break everything.
4. Causal Graphs (DAGs): Mapping the Structure
Directed Acyclic Graphs let you map out causal relationships. Nodes are variables. Arrows are causal links.
Then you use graph theory to figure out:
- What to control for (confounders).
- What not to control for (colliders, mediators).
Example:
Education → Income
↑ ↑
Intelligence
Intelligence causes both education and income. It's a confounder. Control for it.
But if you control for a collider (a variable caused by both X and Y), you induce spurious correlation. DAGs help you avoid that.
Why it works: Makes causal assumptions explicit. Formalizes what regression can and can't identify.
Limitations: Requires correct causal model. If your DAG is wrong, your conclusions are wrong.
The Bradford Hill Criteria: Evaluating Causal Claims
In the 1960s, epidemiologist Austin Bradford Hill proposed criteria for assessing causation from observational data.
1. Strength of association. Larger correlations are more likely causal (harder to explain by confounding).
2. Consistency. The association replicates across studies, populations, and methods.
3. Specificity. X is associated with Y but not with unrelated outcomes.
4. Temporality. X precedes Y in time. (Non-negotiable—causes must come before effects.)
5. Biological gradient. Dose-response relationship. More X → more Y.
6. Plausibility. There's a plausible mechanism. X could cause Y.
7. Coherence. The causal claim fits with existing knowledge.
8. Experiment. Experimental evidence supports the claim.
9. Analogy. Similar causal relationships exist.
These aren't proof. But satisfying more criteria strengthens causal inference. Smoking satisfied all nine—that's how we concluded it causes cancer despite lacking RCTs.
Common Confounding Patterns
Simpson's Paradox: Aggregation Reverses Correlation
Within every subgroup, X → Y. But in the aggregate, Y → X.
Example: A drug improves outcomes in men and women. But overall, it looks harmful. How?
- Men: mostly low-risk, drug helps.
- Women: mostly high-risk, drug helps.
- Aggregate: Drug group has more women (high-risk), so worse outcomes overall.
Solution: Disaggregate. Check subgroup effects.
Collider Bias: Controlling for the Wrong Thing
Collider: A variable caused by both X and Y.
If you control for a collider, you induce spurious correlation between X and Y.
Example:
- Talent and Looks both cause Hollywood success.
- Among successful actors (conditioning on success), talent and looks are negatively correlated. (If you're in Hollywood but not talented, you must be good-looking. If you're there but not good-looking, you must be talented.)
- But in the general population, talent and looks are uncorrelated.
Lesson: Don't blindly "control for everything." DAGs tell you what to control for.
Granger Causality: A Weaker Notion
In time series, Granger causality asks: "Does past X help predict future Y, beyond what past Y alone predicts?"
If yes, X "Granger-causes" Y.
Example: Stock prices. If yesterday's oil prices improve predictions of today's stock prices (beyond yesterday's stock prices), oil Granger-causes stocks.
Critical: Granger causality is not true causation. It's predictive precedence. A third variable (Z) could cause both X and Y, with X leading Y in time.
But it's useful for time series where RCTs are impossible.
Practical Workflow: How to Think About Causation
1. Start with correlation. Is there an association? If not, probably no causation.
2. Consider confounders. What else could cause both X and Y? Can you measure and control for it?
3. Check temporality. Does X precede Y? If not, X can't cause Y.
4. Look for dose-response. More X → more Y? Strengthens causality.
5. Check mechanism. Is there a plausible pathway from X to Y?
6. Seek experimental or quasi-experimental evidence. RCTs, natural experiments, IV.
7. Use causal graphs. Map out the causal structure. Identify what to control for.
8. Replicate. Does the relationship hold across studies, populations, and methods?
9. Be humble. Causal inference from observational data is always tentative. New evidence can overturn conclusions.
Further Reading
- Pearl, J. (2009). Causality: Models, Reasoning, and Inference (2nd ed.). Cambridge University Press.
- Hernán, M. A., & Robins, J. M. (2020). Causal Inference: What If. Chapman & Hall/CRC.
- Angrist, J. D., & Pischke, J. S. (2009). Mostly Harmless Econometrics: An Empiricist's Companion. Princeton University Press.
- Hill, A. B. (1965). "The environment and disease: association or causation?" Proceedings of the Royal Society of Medicine, 58(5), 295-300.
This is Part 10 of the Statistics series, exploring how we extract knowledge from data. Next: "ANOVA Explained."
Part 9 of the Statistics series.
Previous: Linear Regression: Fitting Lines to Data Next: ANOVA: Comparing Multiple Groups
Comments ()