Correlation vs Causation: Why One Does Not Imply the Other
What Correlation Measures
A correlation coefficient (most commonly Pearson's r) quantifies the strength and direction of a linear relationship between two continuous variables. Values range from -1 (perfect negative relationship) through 0 (no linear relationship) to +1 (perfect positive relationship). An r of 0.85 between study hours and exam scores means that students who study more tend to score higher, with the relationship being strong and consistent.
Importantly, correlation measures association, not causation. It tells you that two variables co-vary but says nothing about why. The correlation between ice cream sales and drowning deaths is approximately r = 0.80 in summer months, but eating ice cream does not cause drowning. Both variables rise because of a third factor: warm weather sends more people to both ice cream shops and swimming pools.
Correlation also captures only linear relationships. Two variables with a strong curved (nonlinear) relationship might show a weak or zero Pearson correlation. Life satisfaction and income, for example, show a strong positive relationship at low income levels that flattens at high income levels. The overall linear correlation underestimates the true strength of the relationship because it cannot capture the curve.
Types of Correlation
Beyond the standard Pearson correlation coefficient (r), which measures linear relationships between continuous variables, several other correlation measures serve different data types and relationship shapes. Spearman rank correlation (rho) works with ordinal data or when the relationship is monotonic but not linear. It converts values to ranks and then computes the Pearson correlation on those ranks. If income and happiness have a strong positive relationship that flattens at high incomes, Spearman may capture the association better than Pearson.
Point-biserial correlation measures the relationship between a continuous variable and a binary variable (such as test score and pass/fail status). Partial correlation measures the relationship between two variables after controlling for one or more additional variables, which provides a preliminary (though not conclusive) way to account for confounders. Controlling for temperature, the partial correlation between ice cream sales and drowning deaths drops close to zero, suggesting temperature was driving the original association.
The coefficient of determination (r-squared) is the square of the correlation coefficient and represents the proportion of variance in one variable that is explained by the other. An r of 0.70 means r-squared = 0.49, so roughly 49% of the variation in one variable is associated with variation in the other. The remaining 51% is attributable to other factors. Framing correlations in terms of r-squared often gives a more intuitive sense of how much explanatory power the relationship actually provides.
Why Correlation Does Not Prove Causation
Three main explanations exist for why two variables might be correlated without one causing the other:
Confounding variables (also called lurking variables or third variables) are unmeasured factors that influence both variables simultaneously, creating a statistical association between them even though neither variable affects the other. The classic example: countries with more chocolate consumption per capita tend to win more Nobel Prizes. The confounding variable is national wealth, which enables both luxury food purchases and well-funded research institutions. Chocolate consumption does not produce Nobel laureates.
Reverse causation occurs when the assumed direction of causality is backwards. A study might find that people who exercise regularly report less depression, leading to the conclusion that exercise prevents depression. But it is equally plausible that depression reduces motivation to exercise, meaning the causal arrow points from depression to inactivity rather than from activity to mental health. In reality, both directions likely operate simultaneously, making the causal picture complex.
Coincidental correlation arises from random chance or data mining. With enough variables and enough time, some will correlate purely by accident. The divorce rate in Maine correlates 0.99 with per-capita margarine consumption. The number of films Nicolas Cage appeared in correlates with swimming pool drownings. These spurious correlations have no causal mechanism and would not replicate in different time periods or populations. They are statistical artifacts produced by searching through many possible variable pairs until random co-movements appear.
How to Establish Causation
Establishing that A causes B requires more than observing that A and B are correlated. Several approaches provide increasingly strong evidence for causal claims.
Randomized controlled experiments provide the strongest evidence for causation. By randomly assigning subjects to treatment and control groups, experiments ensure that all other variables (known and unknown) are distributed equally between groups on average. Any observed difference in outcomes can then be attributed to the treatment rather than to confounders, because randomization makes confounders equally likely in both groups. This is why experimental designs are the gold standard for causal inference in medicine, psychology, and the social sciences.
Bradford Hill criteria provide a framework for evaluating causal claims from observational data when experiments are impractical or unethical. These criteria include: strength of association (larger correlations suggest causation more than weak ones), consistency (the relationship replicates across studies and populations), temporality (the cause precedes the effect), biological gradient (dose-response relationship), plausibility (a reasonable mechanism exists), and coherence (the claim does not conflict with known biology or physics). No single criterion proves causation, but satisfying many simultaneously strengthens the causal argument.
Quasi-experimental methods exploit natural experiments, instrumental variables, regression discontinuity designs, and difference-in-differences approaches to approximate the logic of randomized experiments using observational data. These methods rely on finding situations where some factor creates "as-if random" variation in the treatment, allowing researchers to estimate causal effects without actually conducting an experiment. They require careful justification of assumptions and are weaker than true randomization but stronger than simple correlation.
Real-World Consequences of Confusion
Confusing correlation with causation has real consequences in policy, medicine, and everyday life. Observational studies once suggested that hormone replacement therapy (HRT) prevented heart disease in women because women who took HRT had lower heart attack rates. When randomized trials were finally conducted, they revealed that HRT actually increased cardiovascular risk. The original correlation was confounded by the fact that women who sought HRT tended to be wealthier, healthier, and more health-conscious, all of which independently reduce heart disease risk.
Dietary research is particularly prone to causal confusion. Studies reporting that moderate alcohol consumption correlates with better health outcomes often fail to account for the "sick quitter" effect: people who abstain from alcohol often include former heavy drinkers who quit due to health problems, making the abstaining group artificially unhealthy. When studies properly account for this, the apparent health benefits of moderate drinking largely disappear.
In everyday reasoning, the temptation to infer causation from correlation is strong because our brains are wired to detect patterns and assign causes. When two events co-occur repeatedly, we instinctively assume one causes the other. Rigorous statistical thinking requires resisting this impulse and asking: what confounders might explain this relationship? Could the causal direction be reversed? Has this relationship been tested experimentally? These questions do not always yield clean answers, but asking them consistently protects against false conclusions.
Education policy provides another instructive example. Schools that implement laptop programs often see correlation between technology access and test scores. However, schools that adopt laptops early tend to be wealthier, have smaller class sizes, and attract more qualified teachers, all confounders that independently improve scores. When randomized trials are conducted, the effect of laptops on learning is often much smaller than the observational correlation suggests, and sometimes negative. The lesson is universal across fields: observed correlations should generate hypotheses, but establishing causation requires experimental evidence or rigorous quasi-experimental methods that explicitly address confounding.
Correlation establishes that two variables move together, but only controlled experiments with random assignment can establish that one variable causes changes in another. Confounding variables, reverse causation, and coincidence all produce correlations without causation.