Clinical Validation of AI-Assisted Mammography Analysis: Generalizability and Performance Evaluation in Breast Cancer Detection
Main Article Content
Abstract
OBJECTIVES: To evaluate the generalizability and clinical performance of Inspectra Mammography Model (MMG), an artificial intelligence (AI) system for breast cancer detection in mammography, across different clinical settings and to assess its utility as a second reader in reducing inter-radiologist variability.
MATERIALS AND METHODS: The Inspectra MMG model, developed on a modified EfficientNetV2 architecture, was evaluated using two out-of-domain datasets from Bangkok Dusit Medical Services (BDMS) hospitals: a Radiologist-validated Set (172 cases) and a Biopsy-confirmed Set (181 cases). Model performance was assessed by area under the receiver operating characteristic curve (AUROC), sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). Clinical usability was further examined through concordance analysis with radiologists’ reports, expert acceptance ratings, and system usability scale (SUS) scores. To assess the potential second-reader benefit, the model’s impact was analyzed in 119 cases with documented inter-radiologist disagreements.
RESULTS: The model demonstrated robust performance, achieving AUROCs of 0.915 and 0.907 for cancer detection in the Radiologist-validated and Biopsy-confirmed sets, respectively. Lesion localization showed high accuracy with a lesion localization fraction (LLF) of 74.5 and a non-lesion localization fraction (NLF) of 0.53. Clinical usability assessment indicated strong concordance with radiologists’ reports (76.2% for classification, 77.9% for localization). Expert radiologists reported high acceptance of AI-generated results (94.5% and 94.0% acceptance rates). The system achieved SUS score of 69.33, reflecting good usability. In the second reader benefit analysis, the AI aligned with final radiological assessments in 71.4% of cases with inter-radiologist disagreement and identified additional potential findings in 16.8% of cases.
CONCLUSION: AI-powered mammography analysis maintained reliable performance across different clinical environments and effectively supported radiologists in breast cancer screening workflows, The system demonstrated potential to reduce inter-reader variability while enhancing detection sensitivity, supporting its role as a clinically valuable second reader
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
References
The International Agency for Research on Cancer (IARC), World Health Organization (WHO). Breast Cancer. (Accessed on July 15, 2024, at https://www.iarc.who.int/cancer-type/breast-cancer/).
Martinez RG, van Dongen DM. (2023). Deep learning algo rithms for the early detection of breast cancer: A comparative study with traditional machine learning. Inf Med Unlock 2023;41:101317. doi: 10.1016/j.imu.2023.101317.
Lång K, Dustler M, Dahlblom V, et al. Artificial intelligence supported screen reading versus standard double reading in the Mammography Screening with Artificial Intelligence trial (MASAI): a clinical safety analysis of a randomised, controlled, non-inferiority, single-blinded, screening accuracy study. Lan cet Oncol 2023;24(8):936-44. doi:10.1016/S1470 2045(23)00298-X.
Schaffter T, Buist DSM, Lee CI, et al. Nationwide real-world implementation of AI for cancer detection in population-based mammography screening. Nat Med 2024;30(12):3501-10. doi:10.1038/s41591-024-03408-6.
Kim HE, Yoon JH, Kim EK, et al. Artificial intelligence for breast cancer screening in mammography (AI-STREAM): preliminary analysis of a prospective multicenter cohort study. Nat Commun 2024;15:3224. doi:10.1038/s41467-025-57469-3.
Kim HE, Kim HH, Han BK, et al. Changes in cancer detection and false-positive recall in mammography using artificial intel ligence: a retrospective, multireader study. Lancet Digit Health 2020;2(3):e138-48. doi: 10.1016/S2589-7500(20)30003-0.
Schaffter T, Buist DSM, Lee CI, et al. Evaluation of combined artificial intelligence and radiologist assessment to interpret screening mammograms. JAMA Netw Open 2020;3(3):e200265. doi:10.1001/jamanetworkopen.2020.0265
Freeman K, Geppert J, Stinton C, et al. Use of artificial intelligence for image analysis in breast cancer screening programmes: systematic review of test accuracy. BMJ. 2021;374:n1872. doi:10.1136/bmj.n1872.
Rodriguez-Ruiz A, Krupinski E, Mordang JJ, et al. Detection of breast cancer with mammography: effect of an artificial intelligence support system. Radiology 2019;290(2):305-14. doi:10.1148/radiol.2018181371
BusinessWire. Zebra Medical Vision Secures its first FDA clearance in oncology, boosting early detection of breast cancer in mammograms [Online], Jul 27, 2020.(Accessed on July 15, 2024, at https://www.businesswire.com/news/home/20200727005442/en/Zebra-Medical-Vision-Secures-its First-FDA-Clearance-in-Oncology-Boosting-Early-Detection-of-Breast-Cancer-in-Mammograms)
Lunit. Lunit INSIGHT MMG, an AI Solution for Breast Cancer Detection, Now CE Certified [Online], Jun 2,2020. (Accessed on July 15, 2024, at https://www.lunit.io/en/company/news/lunit-insight-mmg-an-ai-solution-for-breast-cancer-detection-now-ce-certified). 154 The Bangkok Medical Journal Vol. 21, No.2; September 2025 ISSN 2287-0237 (online)/ 2228-9674 (print)
Guan S, Loew M. Analysis of generalizability of deep neural networks based on the complexity of decision boundary. In 2020 19th IEEE international conference on machine learning and applications (ICMLA). 2020:101-6. doi: 10.1109/IC MLA51294.2020.00025.
Tan M, Le QV. Efficientnetv2: Smaller models and faster training. 2021. doi: 10.48550/arXiv.2104.00298.
Chakraborty DP, Yoon HJ. Operating characteristics predicted by models for diagnostic tasks involving lesion localization. Med Phys 2008;35(2):435-45. doi: 10.1118/1.2820902.
Kim EY, Kim YJ, Choi WJ, et al. Concordance rate of radiologists and a commercialized deep-learning solution for chest X-ray: Real-world experience with a multicenter health screening cohort. PLoS One 2022;17(2):e0264383. doi: 10.1371/journal.pone.0264383.
Gaube S, Suresh H, Raue M, et al. Do as AI say: susceptibility in deployment of clinical decision-aids. NPJ Digit Med 2021;4(1):31. doi: 10.1038/s41746-021-00385-9.
Brooke J. SUS: a retrospective. J Usabi Stud 2013;8:29-40