How to Do a Chi-Square Test: Testing Relationships in Categorical Data
Two main types of chi-square tests exist. The chi-square test of independence examines whether two categorical variables are related. The chi-square goodness-of-fit test checks whether observed frequencies match a theoretical distribution.
Step 1: State Hypotheses and Create a Contingency Table
For a test of independence, the null hypothesis states that the two variables are independent (not associated). The alternative hypothesis states that they are associated. Organize your data into a contingency table (cross-tabulation) showing the frequency of observations in each combination of categories.
For example, testing whether smoking status (smoker, non-smoker) is associated with lung disease (present, absent), you would create a 2x2 table with the count of people in each combination: smokers with disease, smokers without disease, non-smokers with disease, non-smokers without disease.
Step 2: Calculate Expected Frequencies
The expected frequency for each cell represents what you would observe if the variables were truly independent. It is calculated as:
If 40% of your sample smokes and 10% has lung disease, independence would predict that 4% (0.40 x 0.10) fall in the smoker-with-disease cell. The expected frequency is that percentage times the grand total. Expected frequencies must all be at least 5 for the chi-square approximation to be valid. When cells have expected counts below 5, use Fisher's exact test instead.
Step 3: Compute the Chi-Square Statistic
For each cell in the table, calculate (observed - expected)^2 / expected. Then sum across all cells:
Large differences between observed and expected frequencies produce a large chi-square value, indicating that the data departs substantially from what independence would predict. A chi-square of zero would mean the data perfectly matches the independence assumption.
Step 4: Determine Degrees of Freedom and P-Value
For a test of independence, degrees of freedom = (number of rows - 1) x (number of columns - 1). A 2x2 table has df = 1. A 3x4 table has df = 6. Compare your calculated chi-square statistic to the chi-square distribution with the appropriate degrees of freedom to obtain the p-value. Larger chi-square values with the same degrees of freedom yield smaller p-values.
Step 5: Interpret Results
If the p-value is less than your significance level (typically 0.05), reject the null hypothesis and conclude that the variables are associated. But "associated" does not mean "one causes the other," the same correlation vs causation distinction applies to categorical data.
To understand the pattern of association, examine which cells have the largest contributions to the chi-square statistic (the largest (O-E)^2/E values). These cells show where the data departs most from independence. Also report an effect size measure such as Cramer's V, which ranges from 0 (no association) to 1 (perfect association) and is independent of sample size.
Real-World Applications
Chi-square tests appear throughout research wherever categorical outcomes are analyzed. In medicine, researchers test whether the incidence of side effects differs between drug and placebo groups. In marketing, analysts evaluate whether purchase decisions vary by age bracket or geographic region. In genetics, the chi-square test validates whether observed phenotype ratios match predicted Mendelian ratios, exactly the type of question that motivated much early development of the method.
In survey research, chi-square tests of independence help identify demographic patterns. A political poll might test whether party preference is associated with education level by cross-tabulating the two variables and testing for independence. The result tells you whether the distribution of party preference changes across education categories. If the overall test is significant with more than two categories per variable, standardized residuals for each cell reveal where the departures from independence are strongest.
Chi-Square Goodness-of-Fit Test
The goodness-of-fit version tests whether a single categorical variable follows a specified distribution. For example, testing whether a die is fair by rolling it 120 times and checking if each face appears approximately 20 times. The null hypothesis specifies the expected proportions (1/6 each for a fair die), and the test compares observed frequencies to these expectations. Degrees of freedom equal the number of categories minus 1.
Goodness-of-fit tests also validate distributional assumptions in other analyses. Before applying a model that assumes a specific distribution, you can bin continuous data into categories and test whether the observed frequencies match those predicted by the theoretical distribution. For example, you might test whether residuals from a regression model follow a normal distribution by comparing the frequencies in each bin to what a normal distribution would predict, though dedicated tests like Shapiro-Wilk are usually more powerful for this specific purpose.
Assumptions and Limitations
The chi-square test requires independent observations (each person or unit counted only once), adequate sample size (expected frequency of at least 5 in each cell for reliable results), and random sampling from the population of interest. It cannot determine the direction or strength of an association beyond what Cramer's V provides, and it cannot establish causation. For 2x2 tables with small samples, Fisher's exact test provides exact p-values without relying on the chi-square approximation.
When multiple chi-square tests are performed simultaneously (for example, testing associations between several demographic variables and an outcome), the risk of false positives increases. Apply corrections for multiple comparisons such as Bonferroni adjustment or control the false discovery rate to maintain overall Type I error at acceptable levels.
Yates' continuity correction is sometimes applied to 2x2 tables, subtracting 0.5 from each |O-E| before squaring. This makes the test slightly more conservative and was important historically when computation was manual. With modern software, the uncorrected chi-square or Fisher's exact test are generally preferred.
Effect Size and Reporting
Cramer's V is the standard effect size measure for chi-square tests of independence. It is calculated from the chi-square statistic, sample size, and the smaller of (rows - 1, columns - 1). Values range from 0 to 1, with conventional benchmarks of 0.10 (small), 0.30 (medium), and 0.50 (large) for tables where the smaller dimension is 2. For larger tables, the benchmarks adjust downward because the same strength of association produces smaller V values as table dimensions grow.
The odds ratio is an alternative effect size for 2x2 tables that describes how much more likely the outcome is in one group compared to another. An odds ratio of 3.5 for smoking and lung disease means that the odds of lung disease are 3.5 times higher in smokers than non-smokers. Odds ratios are widely used in epidemiology and medical research because they have a clear, intuitive interpretation and remain approximately constant across different baseline prevalence rates.
A complete chi-square report includes the chi-square statistic, degrees of freedom, exact p-value, sample size, and an effect size measure. For example: chi-square(2, N = 350) = 14.87, p = 0.001, Cramer's V = 0.21. This tells the reader not just that the association exists, but how strong it is and how confident you can be in the result. Without effect size reporting, readers cannot evaluate whether a statistically significant association is large enough to be practically meaningful, especially since chi-square tests in large samples will detect even trivially small associations.
The chi-square test evaluates whether categorical variables are associated by comparing observed frequencies to what independence would predict. Ensure expected cell counts are at least 5, use Fisher's exact test for small samples, and report Cramer's V as an effect size measure alongside the p-value.