AssessingtheaccuracyandagreementofChatGPTIntegrated with Voice Commands in Emergency Severity Index (ESI) Triage for Emergency Patients at Warin Chamrap Hospital
Keywords:
Triage for Emergency, Assessment of discriminative ability, ChatGPTAbstract
Background
Emergency department (ED) triage is critical for patient safety and optimal resource use. This study assessed the accuracy and agreement of Thai voice–enabled ChatGPT in Emergency Severity Index (ESI) triage compared with emergency physicians and triage nurses.
Methods
Cross-sectional study conducted in the ED of Warinchamrap Hospital, Thailand, from 1 April to 31 May 2025. Consecutive ED patients (n = 387) underwent triage by an expert emergency physician panel (reference standard), by triage nurses, and by ChatGPT-4o using data from the standardized triage form entered via Thai voice commands and/or typing. Agreement across the five ESI levels was assessed with weighted κ (kappa). Binary classification performance for critical (ESI 1–2) versus non-critical (ESI 3–5) was reported as sensitivity, specificity, PPV, NPV, and Area under the Receiver Operating Characteristic curve (AuROC). Directional misclassification (over-/under-triage) was tested with the McNemar test.
Results
For classifying emergent cases (ESI 1–2), ChatGPT showed sensitivity 97.6%, specificity 95.9%, PPV 94.8%, NPV 98.1%, and AuROC 97% (95%CI 95.0–99.0), while the nurse showed sensitivity 92.3%, specificity 75.7%, PPV 74.6%, NPV 92.7%, and AuROC 84% (95%CI 81.0–87.0). Agreement with physicians was weighted kappa =0.915 (95%CI 0.844–0.986) (almost perfect agreement) for ChatGPT and 0.607 (95%CI 0.536–0.678) (substantial agreement) for the nurse. ChatGPT showed over-triage 2.6% (10 cases) and under-triage 3.4% (13 cases), total 5.9% (23 cases), McNemar p = 0.270 the nurse showed over-triage 23.3% (90 cases) and undertriage 5.2% (20 cases), total 28.4% (110 cases), McNemar p < 0.001.
Conclusions
ChatGPT with Thai voice commands demonstrated 97% discriminative ability and very high agreement with physicians. Its potential should be applied as a decision-support tool to reduce triage errors and be useful where staffing is limited.
References
Sudaparn Thanyajira, Wanida Aopprasertsak. Emergency and Mass Casualty Nursing. 3rd ed. Bangkok: Samcharoen Panich; 2003.
FitzGerald G, Jelinek G, Scott D, Gerdtz M. Emergency department triage revisited. Emergency Medicine Journal. 2010 Feb 1;27(2):86–92. doi:10.1136/emj.2009.077081
Gilboy N, Tanabe T, Travers DA, Rosenau AM. Emergency Severity Index (ESI): A triage tool for emergency department care. J Emerg Nurs 2011.
Christ M, Grossmann E, Winter D, Bingisser R, Platz E. Modern triage in the emergency department. Dtsch Arztebl Int 2010;107(50): 892-8. Doi:10.3238/arztebl.2010.0892
Ratthapong Buriwong, editor. MOPH ED. Triage. Nonthaburi: Bureau of Medical Academics, Department of Medical Services, Ministry of Public Health; 2018.
Kanistha Suksamnan. A study of urgent patient care models in the medical outpatient department at HRH Princess Maha Chakri Sirindhorn Medical Center [Master’s thesis in Public Health]. Bangkok: Thammasat University; 2019.
Singer RF, Infante AA, Oppenheimer CC, West CA, Siegel B. The use of and satisfaction with the Emergency Severity Index. J Emerg Nurs 2012;38(2):120-6. Doi:10.1016/j.jen.2010.07.004
Natcholphan Homkaew. Concept for developing an “AI” patient triage system to make the “emergency room” truly an “emergency room” [Internet]. 2024 [cited 2025 Feb 20]. Available from: https://www.thecoverage.info/news/content/6688
Colakca C, Ergin M, Ozensoy HS, Sener A, Guru S, Ozhasenekler A. Emergency department triaging using ChatGPT based on emergency severity index principles: a cross-sectional study. Sci Rep 2024;14:22106. Doi:10.1016/j.jen.2010.07.004
Kaboudi N, Firouzbakht S, Eftekhar MS, Fayazbakhsh F, Joharivarnoosfaderani N, Ghaderi S, et al. Diagnostic accuracy of ChatGPT for patients’ triage: a systematic review and meta-analysis. Arch Acad Emerg Med 2024;12(1):e60. Doi:10.1101/2024.05.20.24307543
Paslı S, Şahin AS, Beşer MF, Topçuoğlu H, Yadigaroğlu M, İmamoğlu M. Assessing the precision of artificial intelligence in ED triage decisions: Insights from a study with ChatGPT. Am J Emerg Med 202;78:170-5. Doi:10.1016/j.ajem.2024.01.037.
Hinson JS, Martinez DA, Schmitz PSK, Toerper M, Radu D, Scheulen J, et al. Accuracy of emergency department triage using the Emergency Severity Index and independent predictors of under-triage and over-triage in Brazil: a retrospective cohort analysis. Int J Emerg Med 2018;11(1):3. Doi:10.1186/s12245-017-0161-8.
Downloads
Published
How to Cite
Issue
Section
Categories
License
Copyright (c) 2026 Thai Collage of Emergency Physicians

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
บทความที่ได้รับตีพิมพ์ในวารสารเวชศาสตร์ฉุกเฉินแห่งประเทศไทย ถือเป็นเป็นลิขสิทธิ์ของ วิทยาลัยแพทย์เวชศาสตร์ฉุกเฉินแห่งประเทศไทย
กรณีที่บทความได้รับการตีพิมพ์ในวารสารเวชศาสตร์ฉุกเฉินแห่งประเทศไทยแล้ว จะตีพิมพ์ในรูปแบบอิเล็กทรอนิกส์ ไม่มีสำเนาการพิมพ์ภายหลังหนังสือเผยแพร่เรียบร้อยแล้ว ผู้นิพนธ์ไม่สามารถนำบทความดังกล่าวไปนำเสนอหรือตีพิมพ์ในรูปแบบใดๆ ที่อื่นได้ หากมิได้รับคำอนุญาตจากวารสารเวชศาสตร์ฉุกเฉินแห่งประเทศไทย
