Natural Experiments Explained: When Nature Runs the Study
What Makes a Natural Experiment
Natural experiments differ from true experiments in a crucial way: the researcher does not assign the treatment. Instead, some external factor creates a division between those who are exposed to a condition and those who are not, in a manner that approximates random assignment. The key requirement is that the assignment mechanism is unrelated to the outcome of interest, at least approximately, so that the exposed and unexposed groups are comparable on characteristics that might affect the outcome.
The term "natural experiment" can be misleading because nothing about them is experimental in the traditional sense. The researcher does not manipulate any variable, does not randomly assign participants, and often analyzes data that were collected for entirely different purposes. What makes them "experimental" is the quasi-random nature of the assignment process, which provides a basis for causal inference that ordinary observational studies lack.
Joshua Angrist and Guido Imbens received the 2021 Nobel Prize in Economics partly for developing statistical methods for analyzing natural experiments. Their work on instrumental variables and the local average treatment effect framework provided rigorous tools for extracting causal conclusions from situations where random assignment is impossible.
Classic Examples
John Snow epidemiology study during the 1854 London cholera outbreak is one of the earliest natural experiments. Two water companies served overlapping neighborhoods in London, drawing their water from different points on the Thames. One company drew water upstream of the sewage outflows, the other downstream. Cholera rates were dramatically higher among customers of the downstream company. The quasi-random assignment of water companies to households (based on the company contracts, not on residents health) provided compelling evidence that contaminated water, not "bad air," transmitted cholera.
The Vietnam War draft lottery created a natural experiment for studying the long-term effects of military service. Draft numbers were assigned by birth date through a random lottery, creating groups of men who were more or less likely to serve based on a genuinely random process. Angrist (1990) used this lottery to estimate the causal effect of military service on lifetime earnings, finding that veterans earned 15 percent less than comparable non-veterans, after controlling for the selection effects that would bias a simple veteran/non-veteran comparison.
The introduction of television in different regions at different times has been used as a natural experiment to study the effects of media on behavior. When television was introduced to certain areas of India, researchers found changes in gender attitudes, fertility preferences, and domestic violence rates compared to areas that did not yet have access. The staggered rollout created treatment and control groups based on geography and timing rather than individual choice.
Policy discontinuities create natural experiments when eligibility for a program is determined by a sharp cutoff on a continuous variable. Students who score just above a scholarship threshold receive funding while those just below do not, even though their abilities are nearly identical. Regression discontinuity designs exploit these cutoffs by comparing outcomes for individuals just above and just below the threshold, treating the cutoff as a source of quasi-random assignment.
Analytical Methods
Difference-in-differences (DiD) compares the change in outcomes over time between a group affected by the natural experiment and a group that was not. If a state raises its minimum wage while a neighboring state does not, DiD compares the change in employment in the treated state to the change in the untreated state. The assumption is that both states would have followed the same trend without the policy change (the parallel trends assumption). If this assumption holds, the difference in the differences is a valid estimate of the causal effect.
Instrumental variables (IV) analysis uses a variable (the instrument) that affects the treatment but does not directly affect the outcome except through the treatment. The Vietnam draft lottery number is an instrument for military service: it affects the probability of serving (because higher lottery numbers were less likely to be drafted) but has no direct effect on earnings except through its effect on service. IV analysis isolates the variation in the treatment that is driven by the instrument, providing an unconfounded estimate of the treatment effect.
Regression discontinuity designs (RDD) estimate treatment effects at sharp eligibility cutoffs. By comparing outcomes for units just above and just below the cutoff, RDD approximates a local randomized experiment around the threshold. The validity of RDD depends on the assumption that units cannot precisely manipulate their position relative to the cutoff. If students can retake the scholarship exam to score just above the threshold, the comparison is contaminated by self-selection.
Limitations and Threats to Validity
The assignment mechanism in natural experiments is rarely truly random. People may sort themselves in response to the natural experiment (moving to avoid a policy, opting into a program), creating selection bias. The groups compared may differ on unobserved characteristics despite appearing similar on observed ones. The treatment may not be well-defined or consistently applied, making it difficult to specify exactly what is being tested.
Natural experiments estimate local effects, the effect of the treatment on the specific subpopulation affected by the natural assignment mechanism, which may not generalize to other populations or contexts. The Vietnam draft lottery estimates the effect of military service on men who served because they were drafted, not on men who would have volunteered regardless. These local average treatment effects are valid but narrower in scope than the average treatment effects estimated by randomized controlled trials.
Strengthening Causal Claims from Natural Experiments
Because natural experiments lack the controlled conditions of true experiments, researchers must use additional strategies to support causal interpretations. Falsification tests check whether the treatment variable affects outcomes it should not affect if the causal story is correct. If a policy change that raises the minimum wage appears to affect restaurant employment, but also appears to affect employment in sectors unrelated to minimum wage workers, the causal interpretation is undermined because the apparent effect may reflect a broader economic trend rather than the specific policy change.
Dose-response relationships strengthen causal claims by showing that larger exposures to the treatment produce larger effects. If regions with bigger minimum wage increases show bigger employment effects than regions with smaller increases, the evidence for a causal relationship is stronger than if all regions showed the same effect regardless of the size of the increase. Dose-response patterns are difficult to explain through confounding, because the confounding variable would need to correlate not just with the treatment but with the magnitude of the treatment.
Replication across multiple natural experiments is one of the strongest forms of evidence. If the same causal relationship is observed across different countries, time periods, and policy contexts, the likelihood that confounding explains all of the results decreases substantially. Each natural experiment has different potential confounders, so a finding that replicates across diverse settings provides convergent evidence that is more convincing than any single study.
Triangulation, combining evidence from natural experiments with evidence from randomized experiments, laboratory studies, and theoretical models, provides the strongest foundation for causal conclusions. When multiple research methods that have different strengths and different vulnerabilities to bias all point to the same conclusion, the overall case for causality is much stronger than the evidence from any single method alone.
Natural experiments exploit external events that approximate random assignment to study causal relationships that cannot be experimentally manipulated. They require specialized analytical methods and careful attention to the validity of the assignment mechanism, but they provide some of the strongest evidence available when true experiments are impossible.