Ethics of AI in Healthcare
Bias in Medical AI
Medical AI systems inherit biases from the healthcare system that generated their training data. Dermatology AI trained predominantly on lighter-skinned patients performs significantly worse on darker-skinned patients, missing melanomas and other skin cancers at higher rates for the populations already least served by dermatology. A 2021 study in The Lancet Digital Health found that only 16% of datasets used in dermatology AI studies reported the racial or ethnic composition of their images. Among those that did, most overrepresented lighter-skinned patients. This data gap translates directly into diagnostic accuracy gaps that could cost lives.
The healthcare spending proxy bias discovered in 2019 illustrates how bias can hide in seemingly neutral variables. A commercial algorithm used by major health systems to identify patients needing extra care used healthcare spending as a proxy for healthcare need. Because Black patients in the United States face systemic barriers to accessing care, they historically generate less spending per unit of illness. The algorithm therefore systematically underestimated the health needs of Black patients. At the threshold used for intervention, the algorithm reduced the number of Black patients identified for additional care by more than half compared to an algorithm using direct health measures. This affected an estimated 200 million patients annually.
Diagnostic AI can also encode gender bias. Heart disease manifests differently in women than men, with women more likely to experience atypical symptoms. AI models trained on data that overrepresents typical (male-pattern) presentation may be less accurate at detecting heart disease in women. Similar sex-based differences in disease presentation exist for autoimmune disorders, pain conditions, and mental health diagnoses. Unless training data explicitly accounts for these differences and evaluation metrics are disaggregated by sex and gender, medical AI risks perpetuating the same diagnostic disparities that human medicine has long struggled with.
Clinical Validation and Regulatory Gaps
The FDA has authorized over 500 AI-enabled medical devices as of 2025, the vast majority through the 510(k) pathway, which requires demonstrating that the device is "substantially equivalent" to a legally marketed predicate device. This pathway was designed for incremental improvements to medical hardware, not for machine learning systems that evolve with new data. A significant concern is that many AI devices are authorized based on retrospective studies using curated datasets rather than prospective clinical trials demonstrating benefit in real clinical settings. The performance gap between curated datasets and real-world clinical data can be substantial.
Locked versus adaptive algorithms present a regulatory challenge. A locked algorithm is frozen at the time of authorization and does not change. An adaptive algorithm continues to learn from new data after deployment, potentially improving but also potentially drifting in ways that were not evaluated during the authorization process. The FDA has proposed a framework for regulating adaptive algorithms through "predetermined change control plans" that specify how the algorithm may change and what monitoring is required, but the framework is still developing. In the meantime, some AI devices authorized as locked algorithms are effectively adaptive because they are periodically retrained on new data and resubmitted for authorization.
External validation, testing a model on data from different institutions, patient populations, and equipment than it was trained on, is essential but inconsistently performed. A radiology AI that achieves 95% accuracy on chest X-rays from the hospital where it was developed may achieve only 80% accuracy at a different hospital with different imaging equipment, different patient demographics, and different prevalence of disease. Studies have found that most published medical AI achieves lower accuracy in external validation than in internal validation, and many published models are never externally validated at all. Without rigorous external validation, the reported performance of medical AI may overstate its real-world reliability.
Patient Consent and the Physician-Patient Relationship
When an AI system contributes to a medical decision, patients should be informed. But the current informed consent framework does not clearly cover AI involvement. If a radiologist uses an AI tool to flag potential findings on a chest X-ray, must the patient be told that AI contributed to the interpretation? If a treatment recommendation algorithm suggests a medication, must the patient know that a computer participated in the decision? Currently, there is no consistent standard. Some institutions disclose AI use in patient communications, while others treat AI tools like any other clinical decision support software and do not specifically inform patients.
The American Medical Association's position is that physicians should inform patients about the role of AI in their care and that AI should augment, not replace, physician judgment. This position preserves the physician-patient relationship as the locus of medical decision-making, with AI serving as a tool that the physician uses and is responsible for. The practical challenge is that physicians may not fully understand the AI tools they use. A doctor who follows an AI recommendation without the ability to evaluate its reasoning is not exercising independent medical judgment, even if they formally make the final decision.
Automation bias, the tendency of humans to defer to automated systems even when the systems are wrong, is a documented risk in clinical settings. Studies have shown that physicians presented with AI recommendations are more likely to agree with the AI than to reach independent conclusions, even when the AI is demonstrably wrong. This is particularly pronounced for less experienced physicians and in time-pressured clinical environments. If the physician's role is reduced to rubber-stamping AI recommendations, the safeguard of human oversight becomes illusory, and the practical decision-maker is the AI system rather than the physician.
Liability and Accountability
When an AI-assisted medical decision leads to patient harm, the liability chain is unclear. If a diagnostic AI misses a cancer that a human radiologist would have caught, is the AI developer liable for a defective product, the hospital liable for deploying an inadequate tool, or the physician liable for relying on a faulty recommendation? Current law does not provide clear answers because the legal frameworks for medical malpractice and product liability were designed before AI-assisted medicine existed.
Medical malpractice requires demonstrating that a physician deviated from the standard of care. If the standard of care increasingly includes using AI tools, a physician who does not use available AI might be liable for failing to employ the best available technology. Conversely, a physician who relies on AI that produces an error might argue that they followed the standard of care by using an approved tool. These competing arguments will be resolved through litigation, but the uncertainty creates practical problems: physicians may avoid using AI to limit liability risk, or they may over-rely on AI because using it seems legally safer than not using it.
Product liability could hold AI developers responsible for defective systems, but the medical device regulatory framework complicates this. A device that has been cleared by the FDA is presumed to meet minimum safety standards, which may shield manufacturers from some liability claims. The distinction between a medical device (which the developer is responsible for) and a clinical decision support tool (which the physician is responsible for using appropriately) is legally significant but practically blurry when the tool strongly influences clinical decisions.
Access, Equity, and the Digital Divide
AI has genuine potential to expand access to medical expertise. A dermatology AI can provide screening in communities without dermatologists. A radiology AI can read imaging studies in hospitals without radiologists, which is the reality for many hospitals in developing countries and rural areas. A mental health chatbot can provide support between therapy sessions or in communities without mental health providers. These applications can reduce health disparities by bringing specialist-level analysis to populations that currently lack access.
The risk is that AI creates a two-tier healthcare system: AI-augmented care for well-resourced institutions and AI-only care for under-resourced ones. If AI is deployed as a supplement in wealthy hospitals (helping radiologists catch findings they might miss) but as a replacement in poor ones (substituting for radiologists entirely), the technology widens rather than narrows the quality gap. The ethical imperative is to deploy medical AI in ways that raise the floor of care quality rather than creating new forms of inequality based on who receives human attention and who receives algorithmic processing.
Data representation determines which populations benefit from medical AI. If training data predominantly represents patients from academic medical centers in wealthy countries, the resulting models will work best for those populations and may fail for patients in community health centers, rural hospitals, and developing world settings where disease prevalence, patient demographics, and clinical contexts differ. Ensuring that medical AI benefits all populations requires deliberate investment in diverse, representative training data and validation across the full range of clinical settings where the technology will be deployed.
AI in healthcare offers real benefits in diagnostic accuracy, access expansion, and personalized treatment, but raises distinct ethical challenges around bias in training data, inadequate clinical validation, unclear patient consent, unresolved liability, and the risk of creating a two-tier system where AI supplements care for the privileged and substitutes for care for the underserved.