Comparison of Machine Learning With Logistic Regression for Prediction of Chronic Kidney Disease in the Thai Adult Population
DOI:
https://doi.org/10.33165/rmj.2021.44.4.250334Keywords:
Chronic kidney disease, Machine learning, Clinical prediction modelAbstract
Background: Chronic kidney disease (CKD) takes huge amounts of resources for treatments. Early detection of patients by risk prediction model should be useful in identifying risk patients and providing early treatments
Objective: To compare the performance of traditional logistic regression with machine learning (ML) in predicting the risk of CKD in Thai population.
Methods: This study used Thai Screening and Early Evaluation of Kidney Disease (SEEK) data. Seventeen features were firstly considered in constructing prediction models using logistic regression and 4 MLs (Random Forest, Naïve Bayes, Decision Tree, and Neural Network). Data were split into train and test data with a ratio of 70:30. Performances of the model were assessed by estimating recall, C statistics, accuracy, F1, and precision.
Results: Seven out of 17 features were included in the prediction models. A logistic regression model could well discriminate CKD from non-CKD patients with the C statistics of 0.79 and 0.78 in the train and test data. The Neural Network performed best among ML followed by a Random Forest, Naïve Bayes, and a Decision Tree with the corresponding C statistics of 0.82, 0.80, 0.78, and 0.77 in training data set. Performance of these corresponding models in testing data decreased about 5%, 3%, 1%, and 2% relative to the logistic model by 2%.
Conclusions: Risk prediction model of CKD constructed by the logistic regression, Neural Network, and Random Forest have comprehensible discrimination performance, but the logistic regression tends to have lower overfitting compared to Neural Network, and Random Forest.
References
GBD 2019 Diseases and Injuries Collaborators. Global burden of 369 diseases and injuries in 204 countries and territories, 1990-2019: a systematic analysis for the Global Burden of Disease Study 2019. Lancet. 2020;396(10258):1204-1222. doi:10.1016/S01406736(20)30925-9
GBD Chronic Kidney Disease Collaboration. Global, regional, and national burden of chronic kidney disease, 1990-2017: a systematic analysis for the lobal Burden of Disease Study 2017. Lancet. 2020;395(10225):709-733. doi:10.1016/S0140-67 36(20)30045-3
Inker LA, Astor BC, Fox CH, et al. KDOQI US commentary on the 2012 KDIGO clinical practice guideline for the evaluation and management of CKD. Am J Kidney Dis. 2014;63(5):713-735. doi:10.1053/j.ajkd.2014.01.416
Plantinga LC, Boulware LE, Coresh J, et al. Patient awareness of chronic kidney disease: trends and predictors. Arch Intern Med. 2008;168(20):2268-2275. doi:10.1001/archinte.168.20.2268
Ingsathit A, Thakkinstian A, Chaiprasert A, et al. Prevalence and risk factors of chronic kidney disease in the Thai adult population: Thai SEEK study. Nephrol Dial Transplant. 2010;25(5):1567-1575. doi:10.1093/ndt/gfp669
Chen TK, Knicely DH, Grams ME. Chronic kidney disease diagnosis and management: a review. JAMA. 2019;322(13):1294-1304. doi:10.1001/jama.2019.14745
Elley CR, Robinson T, Moyes SA, et al. Derivation and validation of a renal risk score for people with type 2 diabetes. Diabetes Care. 2013;36(10):3113-3120. doi:10.2337/dc13-0190
Lin CC, Li CI, Liu CS, et al. Development and validation of a risk prediction model for end-stage renal disease in patients with type 2 diabetes. Sci Rep. 2017;7(1):10177. doi:10.1038/s41598-017-09243-9
Demšar J, Curk T, Erjavec A, et al. Orange: data mining toolbox in python. J Mach Learn Res. 2013;14(1):2349-2353.
Miao DD, Pan EC, Zhang Q, Sun ZM, Qin Y, Wu M. Development and validation of a model for predicting diabetic nephropathy in Chinese people. Biomed Environ Sci. 2017;30(2):106-112. doi:10.3967/bes2017.014
Wan EYF, Fong DYT, Fung CSC, et al. Prediction of new onset of end stage renal disease in Chinese patients with type 2 diabetes mellitus - a population-based retrospective cohort study. BMC Nephrol. 2017;18(1):257. doi:10.1186/s12882017-0671-x
Wu M, Lu J, Zhang L, et al. A non-laboratory-based risk score for predicting diabetic kidney disease in Chinese patients with type 2 diabetes. Oncotarget. 2017;8(60):102550-102558. doi:10.18632/oncotarget.21684
Thakkinstian A, Ingsathit A, Chaiprasert A, et al. A simplified clinical prediction score of chronic kidney disease: a cross-sectionalsurvey study. BMC Nephrol. 2011;12:45. doi:10.1186/14712369-12-45
Dagliati A, Marini S, Sacchi L, et al. Machine learning methods to predict diabetes complications. J Diabetes Sci Technol. 2018;12(2):295-302. doi:10.1177/2193296817706375
Rodriguez-Romero V, Bergstrom RF, Decker BS, Lahu G, Vakilynejad M, Bies RR. Prediction of nephropathy in type 2 diabetes: an analysis of the accord trial applying machine learning techniques. Clin Transl Sci. 2019;12(5):519-528. doi:10.1111/cts.12647
Song X, Waitman LR, Hu Y, Yu ASL, Robbins DC, Liu M. Robust clinical marker identification for diabetic kidney disease with ensemble feature selection. J Am Med Inform Assoc. 2019;26(3):242-253. doi:10.1093/jamia/ocy165
Goldfarb-Rumyantzev AS, Pappas L. Prediction of renal insufficiency in Pima Indians with nephropathy of type 2 diabetes mellitus. Am J Kidney Dis. 2002;40(2):252-264. doi:10.1053/ajkd.2002.34503
Beam AL, Kohane IS. Big data and machine learning in health care. JAMA. 2018;319(13):1317-1318. doi:10.1001/jama.2017.18391
Christodoulou E, Ma J, Collins GS, Steyerberg EW, Verbakel JY, Van Calster B. A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. J Clin Epidemiol. 2019;110:12-22. doi:10.1016/j.jclinepi.2019.02.004
Farrington K, Covic A, Aucella F, et al. Clinical practice guideline on management of older patients with chronic kidney disease stage 3b or higher (eGFR <45 mL/min/1.73 m2). Nephrol Dial Transplant. 2016;31(suppl 2):ii1-ii66. doi:10.1093/ndt/gfw356
De Cosmo S, Viazzi F, Pacilli A, et al. Serum uric acid and risk of CKD in type 2 diabetes. Clin J Am Soc Nephrol. 2015;10(11):1921-1929. doi:10.2215/CJN.03140315
Takae K, Nagata M, Hata J, et al. Serum uric acid as a risk factor for chronic kidney disease in a Japanese Community- The Hisayama Study. Circ J. 2016;80(8):1857-1862. doi:10.1253/circj.CJ-16-0030
Liu H, Xiong J, He T, et al. High uric acid-induced epithelialmesenchymal transition of renal tubular epithelial cells via the TLR4/NF-kB signaling pathway. Am J Nephrol. 2017;46(4):333-342. doi:10.1159/000481668
Milanesi S, Verzola D, Cappadona F, et al. Uric acid and angiotensin II additively promote inflammation and oxidative stress in human proximal tubule cells by activation of toll-like receptor 4. J Cell Physiol. 2019;234(7):10868-10876. doi:10.1002/jcp.27929
Echouffo-Tcheugui JB, Kengne AP. Risk models to predict chronic kidney disease and its progression: a systematic review. PLoS Med. 2012;9(11):e1001344. doi:10.1371/journal.pmed.1001344
Engelhard MM, Navar AM, Pencina MJ. Incremental benefits of machine learning-when do we need a better mousetrap? JAMA Cardiol. 2021;6(6):621-623. doi:10.1001/jamacardio.2021.0139