What Is AI Bias?
Where AI Bias Comes From
AI systems learn patterns from data, and when that data reflects historical inequalities, the system learns to reproduce those inequalities. This is not a bug in the algorithm. It is the algorithm working exactly as designed, finding patterns in the data and using them to make predictions. The problem is that the patterns in historical data include patterns of discrimination, exclusion, and unequal access that society has spent decades trying to correct. A machine learning model has no concept of justice, history, or social context. It sees statistical correlations and optimizes for prediction accuracy, treating discriminatory patterns and legitimate patterns with equal weight.
Training data bias is the most intuitive source of algorithmic bias. If a facial recognition system is trained on a dataset that is 80% lighter-skinned faces, it will be significantly less accurate on darker-skinned faces because it has seen fewer examples and learned fewer features relevant to that population. If a natural language processing model is trained on text from the internet, it will absorb the stereotypes, prejudices, and associations present in that text. Word embedding models trained on large text corpora consistently associate "man" with "programmer" and "woman" with "homemaker," reflecting the statistical distribution of these words in the training data rather than any ground truth about human capabilities.
Measurement bias occurs when the variables available to the model are imperfect proxies for the true quantity of interest. A widely studied example comes from healthcare: a commercial algorithm used by major health systems to allocate care management resources used healthcare spending as a proxy for healthcare need. Because Black patients in the United States face systemic barriers to healthcare access and historically generate less healthcare spending per unit of illness, the algorithm systematically underestimated their health needs. A 2019 study in Science found that correcting this single proxy variable would increase the percentage of Black patients receiving additional care from 17.7% to 46.5%. The algorithm was not using race as an input, but spending, the proxy it did use, encoded racial disparities in access.
Selection bias and survivorship bias affect which data points make it into the training set at all. Criminal justice data reflects policing patterns, not crime patterns: neighborhoods with heavier police presence generate more arrests, which trains predictive policing models to direct even more police to those neighborhoods, creating a feedback loop. Hiring data reflects who was previously hired, not who would have been the best candidate. Medical data reflects who had access to healthcare, not who needed it. In each case, the available data is a biased sample of the underlying reality, and models trained on biased samples produce biased predictions.
The Different Types of AI Bias
Researchers have identified over 20 distinct types of bias that can affect AI systems, but they cluster into several major categories. Historical bias exists when the world itself contains inequality that is accurately captured in the data. Even a perfectly representative dataset will contain historical bias if the world it represents is inequitable. A model trained to predict who becomes a CEO will learn that men are more likely to become CEOs, because historically that is true, even though the historical pattern reflects discrimination rather than inherent capability differences.
Representation bias occurs when certain groups are underrepresented in the training data. ImageNet, the dataset that powered the deep learning revolution in computer vision, was constructed primarily from images sourced from English-language websites and labeled by workers recruited through Amazon Mechanical Turk, a platform with a predominantly U.S. user base. The resulting dataset overrepresented certain demographics, geographies, and cultural contexts while underrepresenting others. Models trained on ImageNet inherited these representation gaps, performing better on objects, scenes, and people from well-represented contexts and worse on everything else.
Aggregation bias occurs when a model is built for an entire population but distinct subgroups within that population have different relationships between inputs and outputs. A diabetes prediction model trained on a combined population may fail to capture that HbA1c levels, a key diagnostic marker, have different distributions across racial groups due to biological variation in red blood cell lifespan. Using a single threshold for all groups produces systematic misdiagnosis in some populations. The model performs well "on average" while performing poorly on specific subgroups.
Evaluation bias occurs when the benchmark used to assess a model's performance is itself unrepresentative. If a facial recognition system is evaluated on a test set that underrepresents darker-skinned faces, its reported accuracy will not reflect its real-world performance on diverse populations. The model passes its evaluations with flying colors and then fails in deployment on exactly the populations the evaluation missed. This is why disaggregated evaluation, reporting performance separately for different demographic groups rather than as a single aggregate metric, is essential for detecting bias.
Real-World Consequences of AI Bias
The consequences of AI bias are not abstract. COMPAS (Correctional Offender Management Profiling for Alternative Sanctions), a recidivism prediction tool used by courts across the United States, was found by ProPublica in 2016 to produce racially disparate error rates. Black defendants who did not go on to reoffend were nearly twice as likely to be incorrectly flagged as high risk compared to white defendants in similar circumstances. White defendants who did reoffend were nearly twice as likely to be incorrectly labeled as low risk. These errors translated directly into bail decisions, sentencing recommendations, and parole determinations that affected people's freedom.
In hiring, Amazon built an AI recruiting tool trained on 10 years of historical hiring data. The system learned to penalize resumes that contained indicators of being female, including the word "women's" (as in "women's rugby team") and the names of women's colleges. Amazon scrapped the tool in 2018 after determining that the bias could not be adequately corrected. The case illustrated that simply removing protected characteristics like gender from the input features is insufficient, because models can reconstruct protected characteristics from correlated features like college name, extracurricular activities, and writing style.
Facial recognition bias has produced wrongful arrests. Robert Williams was wrongfully arrested in Detroit in 2020 after a facial recognition system incorrectly matched his driver's license photo to surveillance footage of a shoplifter. He was held for 30 hours before the error was identified. At least three other documented wrongful arrests in the United States have been attributed to facial recognition errors, all involving Black individuals. These incidents are consistent with the technical finding that commercial facial recognition systems have error rates 10 to 100 times higher on darker-skinned faces than on lighter-skinned faces.
Credit and lending algorithms affect financial access for millions. Studies have found that algorithmic lending platforms charge Black and Hispanic borrowers higher interest rates than white borrowers with equivalent credit profiles, even when race is not an explicit input to the model. The algorithms use features like zip code, education history, employment type, and spending patterns that correlate with race due to systemic inequalities. The result is that AI-driven lending, which was expected to reduce human prejudice in credit decisions, can instead automate and scale discriminatory patterns that human loan officers might have been trained to recognize and avoid.
Detecting and Measuring Bias
Bias detection requires defining what fairness means in mathematical terms, and this turns out to be surprisingly difficult. Several competing definitions exist, and they are provably incompatible with each other in most real-world scenarios. Demographic parity requires that the model's positive prediction rate be equal across groups: if 50% of male applicants are approved, 50% of female applicants must be approved. Equalized odds requires that the model's true positive rate and false positive rate be equal across groups. Predictive parity requires that the model's precision (the fraction of positive predictions that are actually positive) be equal across groups.
Chouldechova proved in 2017 that when base rates differ between groups (which they almost always do in practice), demographic parity, equalized odds, and predictive parity cannot all be satisfied simultaneously except in trivial cases. This impossibility result means that any deployment of a consequential AI system requires an explicit choice about which definition of fairness to prioritize, and that choice is a value judgment, not a technical decision. A criminal justice system might prioritize equalized odds (equal false positive rates) to ensure that innocent defendants from all groups face equal risk of wrongful conviction. A hiring system might prioritize demographic parity to ensure equal representation. These are legitimate but incompatible goals.
Practical bias auditing involves disaggregating model performance across demographic groups and testing for statistically significant differences. This requires knowing (or inferring) the demographic characteristics of the people affected by the model, which itself raises privacy concerns. Techniques include comparing confusion matrices across groups, computing fairness metrics at multiple decision thresholds, testing the model on synthetic data designed to isolate demographic effects, and using counterfactual analysis to ask "would this individual have received a different prediction if they belonged to a different group." The Aequitas toolkit, AI Fairness 360, and Fairlearn provide open-source implementations of these techniques.
Reducing AI Bias
Bias mitigation strategies operate at three stages: pre-processing (modifying the training data), in-processing (modifying the training algorithm), and post-processing (modifying the model's outputs). Pre-processing approaches include resampling to balance representation across groups, relabeling data points that reflect historical discrimination, and learning fair representations that encode useful information while removing demographic signals. In-processing approaches add fairness constraints to the optimization objective, so the model simultaneously minimizes prediction error and minimizes disparities across groups. Post-processing approaches adjust decision thresholds for different groups to equalize selected fairness metrics.
Each approach has tradeoffs. Pre-processing methods are model-agnostic but may discard useful information along with discriminatory patterns. In-processing methods can precisely target specific fairness criteria but are tied to specific model architectures. Post-processing methods are simple to implement but can feel ad hoc and may reduce the model's overall utility. In practice, the most effective debiasing strategies combine approaches at multiple stages and include ongoing monitoring after deployment, because bias can drift as the population and its characteristics change over time.
The most important intervention may be organizational rather than technical. Diverse development teams are more likely to identify bias risks that homogeneous teams miss, because team members from different backgrounds bring different assumptions about what "normal" looks like and who the system serves. Including affected communities in the design process, conducting impact assessments before deployment, establishing clear accountability for bias outcomes, and creating feedback channels for people who experience biased outcomes are all organizational practices that reduce bias more reliably than any single technical fix.
AI bias is a systemic problem that enters through data, design, and evaluation at every stage of the machine learning pipeline. Multiple competing definitions of fairness are mathematically incompatible, making bias mitigation a value judgment as much as a technical challenge, and requiring explicit decisions about which groups' interests to prioritize in each application context.