SURPY Python Toolkit for Data Analysis
Keywords:
Data analysis, Python programAbstract
Objective: SURPY is a Python-based package for statistical analysis available on PyPi repository. The present study aims to evaluate performance of the SURPY package in providing basic data analysis compared to a standard statistical package, Stata v.14 (StataCorp, College Station, TX, USA).
Methods: Datasets from previously published studies were retrieved for analysis. The data was transferred to the .DTA format for analysis using the Stata v.14 program and was imported as a dataframe into the Python 3.0 environment, to be analysed by the 'soap' (surgical outcome analysis program) package of SURPY 1.1.7. Results of the analysis from the 2 programs were compared.
Results: The soap package from the SURPY program was able to import data stored in the Microsoft Excel format and calculate basic descriptive statistics. The program correctly performed t-tests and Mann-Whitney U tests. Also, the program was able to produce Kaplan-Meier survival curves and perform log-rank tests, which gave similar outputs compared to those from the Stata program.
Conclusion: The SURPY program can be used for simple data analysis, which could be useful for surgeons who are not familiar with typing commands in commonly used statistical programs. The SURPY program can be further developed to incorporate graphic user interface.
References
Grigis A, Goyard D, Cherbonnier R, et al. Neuroimaging, genetics, and clinical data sharing in Python using the CubicWeb Framework. Front Neuroinform 2017;11:18. doi:10.3389/fninf.2017.00018.
Gowrishankar S, Veena A. Introduction to Python programming. Bota Raton, FL: CRC Press, Taylor and Francis Group; 2019.
Lee GH, Shin SY. Federated learning on clinical benchmark data: performance assessment. J Med Internet Res 2020;22:e20891. doi:10.2196/20891.
Raschka S, Kaufman B. Machine learning and AI-based approaches for bioactive ligand discovery and GPCR-ligand recognition. Methods 2020;180:89-110.
Doupe P, Faghmous J, Basu S. Machine learning for health services researchers. Value Health 2019;22:808-15.
Semeraro R, Magi A. PyPore: a python toolbox for nanopore sequencing data handling. Bioinformatics 2019;35:4445-7.
Albers PN, Wright CY. Clinical trial data management in environmental health tailored for an African setting. Int J Environ Res Public Health 2020;17:doi:10.3390/ijerph17020402.
Jungo A, Scheidegger O, Reyes M, et al. A Python package for data handling and evaluation in deep learning-based medical image analysis. Comput Methods Programs Biomed 2021;198:105796. doi:10.116/j.cmpb.2020.105796.
Niewinski G, Smyk W, Graczynska A, et al. Kidney function after liver transplantation in a single center. Ann Transplant 2021;26:e926928-1-8.doi:10.12659/AOT.926928.
Kuntzelman KM, Williams JM, Lim PC, et al. Deep-learning-based multivariate pattern analysis (dMVPA): a tutorial and a toolbox. Front Hum Neurosci 2021;15:638052. doi:10.3389/fnhum.2021.638052.eCollection 2021.
Liu G, Lu D, Lu J. Pharm-AutoML: an open-source, end-to-end automated machine learning package for clinical outcome prediction. CPT Pharmacometrics Syst Pharmacol 2021;10:478-8.
Wu Z, Wang X, Pan R, et al Study of the relationship between ICU patient recovery and TCM treatment in acute phase: a retrospective study based on Python data mining technology. Evid Based Complement Alternat Med 2021;2021:5548157. doi:10.1155/2021/5548157.
Downloads
Published
How to Cite
Issue
Section
License
Articles must be contributed solely to The Thai Journal of Surgery and when published become the property of the Royal College of Surgeons of Thailand. The Royal College of Surgeons of Thailand reserves copyright on all published materials and such materials may not be reproduced in any form without the written permission.