Keimyung Med J Search

CLOSE


Keimyung Med J > Volume 44(2); 2025 > Article
Kim: How to Prevent Hallucination in Artificial Intelligence-Assisted Clinical Practice

Abstract

The integration of artificial intelligence (AI) into clinical practice has ushered in new frontiers in diagnostic accuracy, operational efficiency, and healthcare accessibility. However, an emerging concern in AI-assisted healthcare is the phenomenon of “hallucination,” the generation of incorrect, fabricated, or unverifiable information, which can mislead clinical decision-making. This review examines the causes and implications of hallucinations in AI-generated clinical data and proposes practical mitigation strategies. Hallucinations can be minimized through enhanced model training, validation using high-quality medical datasets, robust human oversight, adherence to ethical design principles, and the implementation of comprehensive regulatory frameworks, thereby ensuring the safe, ethical, and effective deployment of AI in clinical settings. Interdisciplinary collaboration is critical to improve model transparency and reliability.

Introduction

Artificial intelligence (AI), particularly large language models and deep learning algorithms, is increasingly being integrated into clinical workflows. Applications range from image-based diagnostics and predictive analytics to patient triage and treatment recommendations [1-3]. Despite its significant potential, AI remains susceptible to hallucinations. This refers to the generation of plausible-sounding but incorrect or misleading outputs by AI models, often in the absence of supporting data or clinical contexts [4]. Given their potential to compromise patient care, understanding and mitigating hallucinations in clinical settings is both urgent and essential. As AI systems grow increasingly complex and widely adopted, a multidisciplinary approach involving clinicians, data scientists, ethicists, and policymakers is essential to ensure the responsible and ethical implementation of AI in healthcare.

Understanding artificial intelligence hallucination

Hallucinations in AI originate from several interrelated factors. These include heterogeneous training data quality, lack of contextual understanding, ambiguity in prompt sensitivity, model overfitting, and undergeneralization (Table 1).
Many AI systems are trained on heterogeneous datasets, which may contain outdated, inaccurate, or non-clinical content [5]. Unlike human clinicians, AI lacks true comprehension and may fill information gaps with statistically plausible yet clinically incorrect statements [6]. In generative AI tools, including ChatGPT and Gemini, ambiguous or poorly framed user prompts can trigger misleading outputs, particularly in high-stakes medical contexts [7]. Excessive reliance on limited or biased data can lead models to overfit, generating confident yet incorrect responses in unfamiliar contexts, thereby resulting in poor generalization [8].
Hallucinations in AI can undermine trust, propagate misinformation, and contribute to detrimental clinical decisions. For instance, fabricated laboratory values, incorrect imaging interpretations, and non-evidence-based treatment suggestions can compromise patient safety. In medical education, reliance on hallucinatory AI-generated explanations may distort trainees’ understanding and reinforce misconceptions. Erroneous outputs may strain medico-legal responsibilities, raising questions about accountability when AI is integrated into decision-making processes [8]. In clinical settings, such inaccuracies can lead to delayed diagnoses, increased healthcare costs, and potential legal repercussions resulting from misdiagnosis. Furthermore, repeated exposure to erroneous AI outputs may desensitize clinicians and undermine the rigorous standards typically applied in evidence-based practice.

Strategies for prevention

A multifaceted approach is required to mitigate hallucinations in clinical AI systems. Previous studies suggested the use of curated medical datasets, human-in-the-loop oversight, algorithm transparency and interpretability, prompt engineering and input validation, multistep verification systems, implementation of regulatory standards and audits, interdisciplinary collaboration, and continuous education and training (Table 2).
Training AI models on peer-reviewed, high-quality clinical data, such as PubMed articles, electronic medical records, and validated imaging datasets, can significantly reduce the occurrence of hallucination [9]. Ensuring diversity and representation within training datasets enhances the model’s ability to generalize effectively across varied patient populations. The use of data from real-world settings, including multi-institutional datasets, ensured the diversity and robustness of the training material. Outputs generated by clinical AI systems must be subjected to expert review, particularly in diagnostic and therapeutic applications. Hybrid models that integrate machine-generated outputs with human interpretation demonstrate promise in improving reliability [10]. Routine validation of AI outputs by medical professionals constitutes a critical safety measure in clinical practice.
Developers should prioritize explainable AI techniques that allow clinicians to understand how and why a particular output was generated [11]. This transparent model increases trust and enables clinicians to identify errors early. Well-designed queries and structured input formats can significantly reduce the likelihood of misinterpretation by generative models. The adoption of tools that flag ambiguous responses or request clarification before generating outputs should be encouraged [12]. Educational initiatives can also play a pivotal role in training users to formulate clear, precise, and contextually appropriate inputs.
Multistep verification systems, implementing layered AI models, or combining diverse algorithmic approaches, can serve as a failsafe against erroneous outputs. Some institutions are exploring consensus-based AI systems, wherein multiple models must reach agreement before a result is generated or released [13]. Automated cross-referencing with established clinical guidelines enhances the reliability of the AI-generated outputs.
National and international health authorities must establish auditing systems and certification protocols to assess the accuracy and safety of AI systems employed across clinical settings [14]. Regulatory standards and audits using ethical guidelines and liability frameworks should be continuously updated to keep pace with technological advances. Integrating AI ethics, clinical reasoning, and data literacy into health professional education can empower future clinicians to judiciously utilize AI. A cross-disciplinary dialogue ensures that AI systems align with real-world clinical needs and ethical norms. Ongoing education and training for healthcare professionals with the skills to interpret AI outputs and identify hallucinations are critical. Training programs in digital literacy, AI ethics, and computational methods should be integrated into medical curricula and continuing professional development [15].

Conclusion

Although AI has a transformative potential in healthcare, the risk of hallucinations presents a formidable challenge. The development of trustworthy AI is particularly critical in medicine due to its profound implications for patient health and safety. AI trustworthiness concerns various aspects, including ethics, transparency, and safety requirements. Through rigorous data governance, human oversight, and ongoing regulatory vigilance, these hallucinations can be minimized to manageable levels. Ensuring the accuracy and integrity of AI in clinical decision-making is not solely a technical issue but a fundamental requirement for safeguarding patient safety and upholding ethical standards in medicine. Building resilient AI systems necessitates interdisciplinary collaboration, ongoing research, and adaptive policies that evolve alongside technological advancements.

Acknowledgements

None.

Ethics approval

Not applicable.

Conflict of interest

The author has nothing to disclose.

Funding

None.

Table 1.
Factors related with artificial intelligence hallucination
1. Heterogeneous training data quality
2. Lack of contextual understanding.
3. Poorly framed user prompt sensitivity
4. Model overfitting and under-generalization
Table 2.
Strategies for prevention of artificial intelligence hallucination
1. Use of curated medical datasets
2. Human-in-the-loop oversight
3. Algorithm transparency and interpretability
4. Prompt engineering and input validation
5. Multi-step verification systems
6. Regulatory standards and audits
7. Interdisciplinary collaboration and education
8. Ongoing education and training

References

1. Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature. 2017;542:115–8.
crossref pmid pmc pdf
2. Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med. 2019;25:44–56.
crossref pmid pdf
3. Pedrosa J, Aresta G, Ferreira C, Carvalho C, Silva J, Sousa P, et al. Assessing clinical applicability of COVID-19 detection in chest radiography with deep learning. Sci Rep. 2022;12:6596.
crossref pmid pmc pdf
4. Howell MD, Corrado GS, DeSalvo KB. Three epochs of artificial intelligence in health care. JAMA. 2024;331:242–4.
crossref pmid
5. Duarte-Rojo A, Sejdic E. Artificial intelligence and the risk for intuition decline in clinical medicine. Am J Gastroenterol. 2022;117:401–2.
crossref pmid
6. Kaplan AD, Kessler TT, Brill JC, Hancock PA. Trust in artificial intelligence: meta-analytic findings. Hum Factors. 2023;65:337–59.
crossref pmid pdf
7. Simon E, Swanson K, Zou J. Language models for biological research: a primer. Nat Methods. 2024;21:1422–9.
crossref pmid pdf
8. Dilsizian SE, Siegel EL. Artificial intelligence in medicine and cardiac imaging: harnessing big data and advanced computing to provide personalized medical diagnosis and treatment. Curr Cardiol Rep. 2014;16:441.
crossref pmid pdf
9. Johnson AE, Pollard TJ, Shen L, Lehman LW, Feng M, Ghassemi M, et al. MIMIC-III, a freely accessible critical care database. Sci Data. 2016;3:160035.
crossref pmid pmc pdf
10. Kelly CJ, Karthikesalingam A, Suleyman M, Corrado G, King D. Key challenges for delivering clinical impact with artificial intelligence. BMC Med. 2019;17:195.
crossref pmid pmc pdf
11. El Arab RA, Almoosa Z, Alkhunaizi M, Abuadas FH, Somerville J. Artificial intelligence in hospital infection prevention: an integrative review. Front Public Health. 2025;13:1547450.
crossref pmid pmc
12. Kantor J. Best practices for implementing ChatGPT, large language models, and artificial intelligence in qualitative and survey-based research. JAAD Int. 2024;14:22–3.
crossref
13. Char DS, Shah NH, Magnus D. Implementing machine learning in health care-addressing ethical challenges. N Engl J Med. 2018;378:981–3.
crossref pmid pmc
14. European Commission. Artificial intelligence. [cited 2021 May 02]. Available from: https://digital-strategy.ec.europa.eu/en/policies/artificial-intelligence.

15. Schwabe D, Becker K, Seyferth M, Klaß A, Schaeffter T. The METRIC-framework for assessing data quality for trustworthy AI in medicine: a systematic review. NPJ Digit Med. 2024;7:203.
crossref pmid pmc pdf
TOOLS
Share :
Facebook Twitter Linked In Line it
METRICS Graph View
  • 1 Crossref
  •    
  • 1,643 View
  • 26 Download
Related articles in Keimyung Med J

The Augmented Clinician: Artificial Intelligence as an Indispensable Co-pilot2025 December;44(2)



ABOUT
BROWSE ARTICLES
EDITORIAL POLICY
FOR CONTRIBUTORS
Editorial Office
1095 Dalgubeol-daero, Dalseo-gu, Daegu 42601, Korea
Tel: +82-53-258-7581    E-mail: tinlib@dsmc.or.kr                

Copyright © 2026 by Keimyung University School of Medicine.

Developed in M2PI

Close layer
prev next