AI in Criminal Justice
Predictive Policing
Predictive policing systems use historical crime data to forecast where crimes are likely to occur and, in some cases, who is likely to commit them. Place-based systems like PredPol (now Geolitica) divide cities into grid cells and predict which cells will experience the most crime in the next shift, directing patrol officers to those areas. The algorithms use time, location, and crime type from historical incident reports to identify statistical hotspots. Person-based systems like the Chicago Strategic Subject List assign risk scores to individuals based on their criminal history, social network connections, and other factors, identifying people the algorithm considers most likely to be involved in future violence.
The fundamental problem with predictive policing is that the data it trains on reflects policing patterns, not crime patterns. Neighborhoods with heavier police presence generate more arrests, more incident reports, and more recorded crimes. This does not necessarily mean more crime occurs there, it means more crime is detected and documented. When a predictive policing algorithm learns from this data, it directs more police to already heavily policed neighborhoods, which generates more arrests, which feeds back into the model as evidence that those neighborhoods are high-crime areas. This feedback loop amplifies the original policing disparity rather than correcting it.
A 2019 study by the RAND Corporation evaluated the effectiveness of predictive policing and found limited evidence that it reduced crime. Several cities that adopted predictive policing, including Los Angeles and New Orleans, subsequently discontinued their programs due to concerns about racial bias, lack of effectiveness, and community opposition. The LAPD's use of PredPol was found to disproportionately direct police to Black and Latino neighborhoods, and an audit revealed that the system relied on data that included minor offenses and drug possession arrests, categories where enforcement is heavily influenced by policing decisions rather than actual crime rates.
The Chicago Strategic Subject List was particularly controversial because it assigned risk scores to specific individuals. The list included over 398,000 people at its peak, nearly all of whom had been arrested at some point, overwhelmingly drawn from Black and Latino communities on the city's South and West Sides. An investigation found that inclusion on the list correlated with receiving increased police attention, including home visits from officers, regardless of whether the individual was engaged in criminal activity. The city discontinued the program in 2019 after audits found that it had not demonstrably reduced violence but had generated widespread community distrust.
Risk Assessment in Bail and Sentencing
Risk assessment instruments (RAIs) are used in jurisdictions across the United States to inform decisions about pretrial detention, sentencing, and parole. These tools predict the likelihood that a defendant will fail to appear for court, reoffend, or commit a violent offense if released. They range from simple actuarial checklists (like the Virginia Pretrial Risk Assessment Instrument, which scores six factors) to proprietary algorithmic tools (like COMPAS, which uses 137 questions to generate risk scores). The policy motivation is to replace subjective judicial discretion, which is demonstrably inconsistent and biased, with standardized, evidence-based predictions.
COMPAS (Correctional Offender Management Profiling for Alternative Sanctions), developed by Equivant (formerly Northpointe), became the most scrutinized criminal justice algorithm after a 2016 ProPublica investigation. ProPublica analyzed COMPAS predictions for over 7,000 defendants in Broward County, Florida, and compared them to actual recidivism outcomes over two years. The investigation found that Black defendants who did not go on to reoffend were nearly twice as likely as white non-reoffenders to be classified as high risk (a false positive rate of 44.9% for Black defendants versus 23.5% for white defendants). White defendants who did reoffend were nearly twice as likely as Black reoffenders to be classified as low risk (a false negative rate of 47.7% for white defendants versus 28.0% for Black defendants).
Equivant responded that COMPAS satisfied a different fairness criterion: predictive parity. Among defendants scored as high risk, the actual recidivism rate was similar across racial groups (roughly 63% for Black defendants and 59% for white defendants). A score of "high risk" meant approximately the same thing regardless of the defendant's race. Both ProPublica's and Equivant's analyses were statistically correct. The disagreement reflects the mathematical impossibility of simultaneously satisfying multiple fairness criteria when base rates differ between groups, which they do in the criminal justice context because of differential policing, prosecution, and socioeconomic factors.
The practical impact of risk assessment scores depends on how they are used. In some jurisdictions, RAI scores are advisory, provided to judges as one input among many. In others, they trigger automatic consequences: a high-risk score may result in mandatory pretrial detention or denial of parole. Studies on whether judges actually follow RAI recommendations find mixed results. Some research shows that judges override recommendations frequently, particularly for white defendants (reducing sentences below what the tool recommends) and for Black defendants (increasing sentences above recommendations), potentially amplifying rather than reducing racial disparities.
Facial Recognition in Law Enforcement
Law enforcement agencies across the world use facial recognition for suspect identification, surveillance, and investigation. The technology compares faces captured from security cameras, body cameras, social media, or submitted photographs against databases of known individuals, mugshots, driver's license photos, or watchlists. The FBI's Next Generation Identification system contains over 150 million facial images. Clearview AI's database contains over 30 billion images scraped from the internet. State and local agencies maintain their own databases of varying sizes.
The accuracy disparities in facial recognition technology have produced documented wrongful arrests. Robert Williams was arrested in Detroit in January 2020 after a facial recognition match incorrectly identified him as a shoplifting suspect. He was detained for 30 hours, interrogated, and held in a crowded cell before the error was identified. Nijeer Parks was wrongfully arrested in New Jersey in 2019 based on a facial recognition match and spent 10 days in jail before the charges were dropped. Michael Oliver was wrongfully accused in Detroit in 2019, also based on a faulty facial recognition match. All three men are Black, consistent with the documented higher error rates on darker-skinned faces.
The accuracy gap has narrowed since the 2018 Gender Shades study but has not been eliminated. The National Institute of Standards and Technology (NIST) Face Recognition Vendor Test, the most comprehensive independent evaluation, found in its 2023 assessment that the best commercial algorithms had false positive rates 5 to 10 times higher on Black faces than on white faces. The gap varies by vendor: some algorithms show minimal demographic differences, while others show large ones. But even a "small" disparity in false positive rates produces significant numbers of wrongful identifications when applied to large databases. A false positive rate of 0.1% applied to a database of 150 million faces produces 150,000 false matches.
Multiple jurisdictions have restricted or banned law enforcement use of facial recognition. San Francisco, Oakland, Boston, and several other U.S. cities have banned government use of facial recognition. The EU AI Act prohibits real-time biometric identification in public spaces with narrow law enforcement exceptions. Several U.S. states require warrants or other legal authorization before law enforcement can use facial recognition. These restrictions reflect growing recognition that the technology's error rates, combined with the severity of consequences in criminal justice, create unacceptable risks of harm, particularly for communities that already experience disproportionate policing.
The Deeper Problem: Data Reflects the System
Every AI application in criminal justice faces the same foundational challenge: the data available to train these systems is generated by a criminal justice system that has well-documented racial disparities at every stage. Black Americans are more likely to be stopped by police, more likely to be searched during a stop, more likely to be arrested, more likely to be charged, more likely to receive a plea offer that includes incarceration, more likely to be convicted, and more likely to receive a longer sentence than white Americans in comparable circumstances. These disparities are extensively documented by the Department of Justice, academic researchers, and advocacy organizations.
An AI system trained on this data learns these patterns. It does not distinguish between patterns that reflect genuine differences in behavior and patterns that reflect discriminatory enforcement. A recidivism prediction model that uses arrest history as a feature encodes differential policing: a person who lives in a heavily policed neighborhood accumulates more arrests for the same behavior than a person in a lightly policed neighborhood. A model that uses employment history encodes employment discrimination. A model that uses neighborhood characteristics encodes residential segregation. The model is not biased because it makes errors, it is biased because it accurately reflects a biased system.
This creates a fundamental question about the role of AI in criminal justice: should these systems aim to predict outcomes within the existing system (who will be rearrested given current policing patterns) or should they aim to predict some notion of actual risk independent of system biases? The first is technically easier but perpetuates existing disparities. The second is normatively desirable but requires defining "actual risk" in a way that separates behavior from surveillance intensity, a task that may be impossible with available data.
AI in criminal justice automates patterns from a system with well-documented racial disparities at every stage. Predictive policing creates feedback loops that amplify existing enforcement patterns, risk assessment tools face mathematically irreconcilable fairness criteria, and facial recognition produces error rates that disproportionately harm the communities already most affected by the justice system.