Comparison of Machine Learning With Logistic Regression for Prediction of Chronic Kidney Disease in the Thai Adult Population

Ratchainant Thammasudjarit; Punnathorn   Ingsathit; Sigit   Ari  Saputro; Atiporn Ingsathit; Ammarin   Thakkinstian

doi:10.33165/rmj.2021.44.4.250334

Authors

Ratchainant Thammasudjarit Department of Clinical Epidemiology and Biostatistics, Faculty of Medicine Ramathibodi Hospital, Mahidol University, Bangkok, Thailand
Punnathorn Ingsathit Department of Clinical Epidemiology and Biostatistics, Faculty of Medicine Ramathibodi Hospital, Mahidol University, Bangkok, Thailand / Triam Udom Suksa School, Bangkok, Thailand
Sigit Ari Saputro Division of Biostatistics and Health Informatics, Faculty of Public Health, Airlangga University, Surabaya, Indonesia
Atiporn Ingsathit Department of Clinical Epidemiology and Biostatistics, Faculty of Medicine Ramathibodi Hospital, Mahidol University, Bangkok, Thailand
Ammarin Thakkinstian Department of Clinical Epidemiology and Biostatistics, Faculty of Medicine Ramathibodi Hospital, Mahidol University, Bangkok, Thailand

DOI:

https://doi.org/10.33165/rmj.2021.44.4.250334

Keywords:

Chronic kidney disease, Machine learning, Clinical prediction model

Abstract

Background: Chronic kidney disease (CKD) takes huge amounts of resources for treatments. Early detection of patients by risk prediction model should be useful in identifying risk patients and providing early treatments

Objective: To compare the performance of traditional logistic regression with machine learning (ML) in predicting the risk of CKD in Thai population.

Methods: This study used Thai Screening and Early Evaluation of Kidney Disease (SEEK) data. Seventeen features were firstly considered in constructing prediction models using logistic regression and 4 MLs (Random Forest, Naïve Bayes, Decision Tree, and Neural Network). Data were split into train and test data with a ratio of 70:30. Performances of the model were assessed by estimating recall, C statistics, accuracy, F1, and precision.

Results: Seven out of 17 features were included in the prediction models. A logistic regression model could well discriminate CKD from non-CKD patients with the C statistics of 0.79 and 0.78 in the train and test data. The Neural Network performed best among ML followed by a Random Forest, Naïve Bayes, and a Decision Tree with the corresponding C statistics of 0.82, 0.80, 0.78, and 0.77 in training data set. Performance of these corresponding models in testing data decreased about 5%, 3%, 1%, and 2% relative to the logistic model by 2%.

Conclusions: Risk prediction model of CKD constructed by the logistic regression, Neural Network, and Random Forest have comprehensible discrimination performance, but the logistic regression tends to have lower overfitting compared to Neural Network, and Random Forest.

References

GBD 2019 Diseases and Injuries Collaborators. Global burden of 369 diseases and injuries in 204 countries and territories, 1990-2019: a systematic analysis for the Global Burden of Disease Study 2019. Lancet. 2020;396(10258):1204-1222. doi:10.1016/S01406736(20)30925-9

GBD Chronic Kidney Disease Collaboration. Global, regional, and national burden of chronic kidney disease, 1990-2017: a systematic analysis for the lobal Burden of Disease Study 2017. Lancet. 2020;395(10225):709-733. doi:10.1016/S0140-67 36(20)30045-3

Inker LA, Astor BC, Fox CH, et al. KDOQI US commentary on the 2012 KDIGO clinical practice guideline for the evaluation and management of CKD. Am J Kidney Dis. 2014;63(5):713-735. doi:10.1053/j.ajkd.2014.01.416

Plantinga LC, Boulware LE, Coresh J, et al. Patient awareness of chronic kidney disease: trends and predictors. Arch Intern Med. 2008;168(20):2268-2275. doi:10.1001/archinte.168.20.2268

Ingsathit A, Thakkinstian A, Chaiprasert A, et al. Prevalence and risk factors of chronic kidney disease in the Thai adult population: Thai SEEK study. Nephrol Dial Transplant. 2010;25(5):1567-1575. doi:10.1093/ndt/gfp669

Chen TK, Knicely DH, Grams ME. Chronic kidney disease diagnosis and management: a review. JAMA. 2019;322(13):1294-1304. doi:10.1001/jama.2019.14745

Elley CR, Robinson T, Moyes SA, et al. Derivation and validation of a renal risk score for people with type 2 diabetes. Diabetes Care. 2013;36(10):3113-3120. doi:10.2337/dc13-0190

Lin CC, Li CI, Liu CS, et al. Development and validation of a risk prediction model for end-stage renal disease in patients with type 2 diabetes. Sci Rep. 2017;7(1):10177. doi:10.1038/s41598-017-09243-9

Demšar J, Curk T, Erjavec A, et al. Orange: data mining toolbox in python. J Mach Learn Res. 2013;14(1):2349-2353.

Miao DD, Pan EC, Zhang Q, Sun ZM, Qin Y, Wu M. Development and validation of a model for predicting diabetic nephropathy in Chinese people. Biomed Environ Sci. 2017;30(2):106-112. doi:10.3967/bes2017.014

Wan EYF, Fong DYT, Fung CSC, et al. Prediction of new onset of end stage renal disease in Chinese patients with type 2 diabetes mellitus - a population-based retrospective cohort study. BMC Nephrol. 2017;18(1):257. doi:10.1186/s12882017-0671-x

Wu M, Lu J, Zhang L, et al. A non-laboratory-based risk score for predicting diabetic kidney disease in Chinese patients with type 2 diabetes. Oncotarget. 2017;8(60):102550-102558. doi:10.18632/oncotarget.21684

Thakkinstian A, Ingsathit A, Chaiprasert A, et al. A simplified clinical prediction score of chronic kidney disease: a cross-sectionalsurvey study. BMC Nephrol. 2011;12:45. doi:10.1186/14712369-12-45

Dagliati A, Marini S, Sacchi L, et al. Machine learning methods to predict diabetes complications. J Diabetes Sci Technol. 2018;12(2):295-302. doi:10.1177/2193296817706375

Rodriguez-Romero V, Bergstrom RF, Decker BS, Lahu G, Vakilynejad M, Bies RR. Prediction of nephropathy in type 2 diabetes: an analysis of the accord trial applying machine learning techniques. Clin Transl Sci. 2019;12(5):519-528. doi:10.1111/cts.12647

Song X, Waitman LR, Hu Y, Yu ASL, Robbins DC, Liu M. Robust clinical marker identification for diabetic kidney disease with ensemble feature selection. J Am Med Inform Assoc. 2019;26(3):242-253. doi:10.1093/jamia/ocy165

Goldfarb-Rumyantzev AS, Pappas L. Prediction of renal insufficiency in Pima Indians with nephropathy of type 2 diabetes mellitus. Am J Kidney Dis. 2002;40(2):252-264. doi:10.1053/ajkd.2002.34503

Beam AL, Kohane IS. Big data and machine learning in health care. JAMA. 2018;319(13):1317-1318. doi:10.1001/jama.2017.18391

Christodoulou E, Ma J, Collins GS, Steyerberg EW, Verbakel JY, Van Calster B. A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. J Clin Epidemiol. 2019;110:12-22. doi:10.1016/j.jclinepi.2019.02.004

Farrington K, Covic A, Aucella F, et al. Clinical practice guideline on management of older patients with chronic kidney disease stage 3b or higher (eGFR <45 mL/min/1.73 m2). Nephrol Dial Transplant. 2016;31(suppl 2):ii1-ii66. doi:10.1093/ndt/gfw356

De Cosmo S, Viazzi F, Pacilli A, et al. Serum uric acid and risk of CKD in type 2 diabetes. Clin J Am Soc Nephrol. 2015;10(11):1921-1929. doi:10.2215/CJN.03140315

Takae K, Nagata M, Hata J, et al. Serum uric acid as a risk factor for chronic kidney disease in a Japanese Community- The Hisayama Study. Circ J. 2016;80(8):1857-1862. doi:10.1253/circj.CJ-16-0030

Liu H, Xiong J, He T, et al. High uric acid-induced epithelialmesenchymal transition of renal tubular epithelial cells via the TLR4/NF-kB signaling pathway. Am J Nephrol. 2017;46(4):333-342. doi:10.1159/000481668

Milanesi S, Verzola D, Cappadona F, et al. Uric acid and angiotensin II additively promote inflammation and oxidative stress in human proximal tubule cells by activation of toll-like receptor 4. J Cell Physiol. 2019;234(7):10868-10876. doi:10.1002/jcp.27929

Echouffo-Tcheugui JB, Kengne AP. Risk models to predict chronic kidney disease and its progression: a systematic review. PLoS Med. 2012;9(11):e1001344. doi:10.1371/journal.pmed.1001344

Engelhard MM, Navar AM, Pencina MJ. Incremental benefits of machine learning-when do we need a better mousetrap? JAMA Cardiol. 2021;6(6):621-623. doi:10.1001/jamacardio.2021.0139