The Grammar of Science: Do Clusters Really Matter?

Authors

  • Jaranit Kaewkungwal Mahidol University, Thailand

DOI:

https://doi.org/10.59096/osir.v18i3.277904

Abstract

-

References

Columbia University Mailman School of Public Health. Multi-Level Modeling [Internet]. New York: Columbia University Mailman School of Public Health; [cited 2025 Aug 15]. <https://www.publichealth.columbia.edu/research/population-health-methods/multi-level-modeling>

Ntani G, Inskip H, Osmond C, Coggon D. Consequences of ignoring clustering in linear regression. BMC Med Res Methodol. 2021 Jul 7;21(1):139. doi: 10.1186/s12874-021-01333-7.

Bellemare MF. Metrics Monday: when (not) to cluster? [Internet]. Saint Paul: Marc F. Bellemare; [updated 2017 Nov 13; cited 2025 Aug 15]. <https://marcfbellemare.com/wordpress/12712>

Hoffman L. Introduction to multilevel models (MLMs) for clustered data [Internet]. Iowa City: Lesa Hoffman; [cited 2025 Aug 15]. 21 p. <https://www.lesahoffman.com/PSQF7375_Clustered/PSQF7375_Clustered_Lecture1_Intro_MLM.pdf>

Zyzanski SJ, Flocke SA, Dickinson LM. On the nature and analysis of clustered data. Ann Fam Med. 2004 May-Jun;2(3):199–200. doi:10.1370/afm.197.

Adam NS, Twabi HS, Manda SOM. A simulation study for evaluating the performance of clustering measures in multilevel logistic regression. BMC Med Res Methodol. 2021 Nov 13;21(1):245. doi:10.1186/s12874-021-01417-4.

McNeish DM. Analyzing clustered data with OLS regression: the effect of a hierarchical data structure. Multiple Linear Regression Viewpoints. 2014;40(1):1–16.

Austina PC, Merlod J. Intermediate and advanced topics in multilevel logistic regression analysis. Stat Med. 2017 Sep 10;36(20):3257-3277. doi:10.1002/sim.7336.

King J. Clustered data: data analysis for psychology in R 3 [Internet]. Edinburgh: Department of Psychology, University of Edinburgh; [cited 2025 Aug 15]. 34 p. <https://uoepsy.github.io/dapr3/2324/lectures/dapr3_2324_01b_clusters.html#1>

Miles J. Methods for dealing with clustered data [Internet]. Southampton: National Centre for Research Methods Social Sciences. [cited 2025 Aug 15]. 57 p. <https://eprints.ncrm.ac.uk/id/eprint/4725/1/Methods%20for%20Dealing%20with%20Clustered%20Data.pdf>

Barratt H, Kirwan M, Shantikumar S. Clustered data - effects on sample size and approaches to analysis [Internet]. London: Faculty of Public Health; c2018 [cited 2025 Aug 15]. <https://www.healthknowledge.org.uk/public-healthtextbook/research-methods/1a-epidemiology/clustered-data>

Hosmer DW. Lemeshow S, Sturdivant RX. Applied logistic regression. 3rd ed. Hoboken: John Wiley & Sons, Inc; 2013. 510 p. doi:10.1002/9781118548387.

Agresti A. An Introduction to categorical data analysis. 3rd ed. Hoboken: John Wiley & Sons, Inc; 2006. 372 p. doi:10.1002/0470114754.

Taboga M. Logistic classification model (logit or logistic regression) [Internet]. North Charleston: Kindle Direct Publishing; 2021 [cited 2025 Aug 15]. <https://www.statlect.com/fundamentals-of-statistics/logistic-classification-model.>

Sommet N, Morselli D. Keep calm and learn multilevel logistic modeling: a simplified three-step procedure using Stata, R, Mplus, and SPSS. International Review of Social Psychology. 2017;30(1):203–18. doi:10.5334/irsp.90.

Galbraith S, Daniel JA, Vissel B. A study of clustered data and approaches to its analysis. J Neurosci. 2010 Aug 11;30(32):10601–8. doi:10.1523/JNEUROSCI.0362-10.2010.

Lee S. ROC & AUC in logistic regression: a primer [Internet]. New York: Number Analytics LLC; [updated 2025 May 16; cited: 2025 Aug 15]. <https://www.numberanalytics.com/blog/roc-auc-logistic-regression-primer>

Oberauer K. The Importance of Random Slopes in Mixed Models for Bayesian Hypothesis Testing. Psychol Sci. 2022 Apr;33(4):648–65. doi:10.1177/09567976211046884.

Al Amin M, Qin Y. Multilevel analysis in Stata: a step-by-step guide [Internet]. Princeton: Princeton University Library; [updated 2024 Aug 14; cited 2025 Aug 15]. <https://libguides.princeton.edu/multilevel>

Sparks CS. DEM 7473 - week 3: basic hierarchical models - random intercepts and slopes [Internet]. Boston: RStudio; 2018 Sep 17 [cited: 15 Aug 2025]. <https://rpubs.com/corey_sparks/420770>

Centre for Multilevel Modelling, University of Bristol. Random slope models [Internet]. Bristol: University of Bristol; [cited 2025 Aug 15]. <https://www.bristol.ac.uk/cmm/learning/videos/random-slopes.html>

College of Public Health & Health Professional. University of Florida. Random slope models [Internet]. Gainesville: University of Florida Health; [cited 2025 Aug 15]. <https://users.phhp.ufl.edu/rlp176/Courses/SurveyBiostat/LMM/RSmodels.html>

Long R. What are the arguments in favor and against using random slopes? [Internet]. New York: Stack Exchange Inc; 2021 May 17 [cited 2025 Aug 15]. <https://stats.stackexchange.com/questions/524599/what-are-the-arguments-in-favor-and-against-using-random-slopes>

Long R. Is it a must to include a random slope in a mixed model? [Internet]. New York: Stack Exchange Inc; 2020 Aug 28 [cited 2025 Aug 15]. <https://stats.stackexchange.com/questions/485048/is-it-a-must-to-include-a-random-slope-in-a-mixed-model>

Heisig JP, Schaeffer M. Why you should always include a random slope for the lower-level variable involved in a cross-level interaction. European Sociological Review. 2019;35(2):258–79. doi:10.1093/esr/jcy053.

Snijders TAB, Bosker RJ. Multilevel analysis: an introduction to basic and advanced multilevel modeling. 2nd ed. London: SAGE Publications; 2004. 368 p.

Jani Data Diaries. Choosing the best model: f friendly guide to AIC and BIC [Internet]. San Francisco: A Medium Corporation; 2024 Nov 7 [cited 2025 Aug 15]. <https://medium.com/@jshaik2452/choosing-the-best-model-a-friendly-guide-to-aic-and-bic-af220b33255f>

Banerjee S. Model magic with AIC & BIC: navigating fit and elegance [Internet]. San Francisco: A Medium Corporation; 2023 Oct 16 [cited 2025 Aug 15]. <https://shekhar-banerjee96.medium.com/model-magic-aic-bic-mdl-navigating-fit-and-elegance-726c784edf9b>

Kumar A. AIC in logistic regression: formula, example [Internet]. New York: Analytics Yogi; 2023 Nov 30 [cited 2025 Aug 15]. <https://vitalflux.com/aic-in-logistic-regression-formula-example/>

Faculty of Medicine and Health Sciences. Goodness of fit in logistic regression [Internet]. Montreal: McGill University; [cited 2025 Aug 15]. 17 p. <https://www.medicine.mcgill.ca/epidemiology/joseph/courses/epib-621/logfit.pdf>

Arya N. Classification metrics walkthrough: logistic regression with accuracy, precision, recall, and ROC [Internet]. San Juan: KDnuggets; 2022 Oct 13 [cited 2025 Aug 15]. <https://www.kdnuggets.com/2022/10/classification-metrics-walkthrough-logistic-regression-accuracy-precision-recall-roc.html>

LaMorte WW. Screening for disease: test validity [Internet]. Boston: School of Public Health, Boston University; [cited 2025 Aug 15]. <https://sphweb.bumc.bu.edu/otlt/mph-modules/ep/ep713_screening/EP713_Screening3.html>

Florkowski CM. Sensitivity, specificity, receiver-operating characteristic (ROC) curves and likelihood ratios: communicating the performance of diagnostic tests. Clin Biochem Rev. 2008;29 Suppl 1(Suppl 1):S83–7.

Fan J, Upadhye S, Worster A. Understanding receiver operating characteristic (ROC) curves. CJEM. 2006;8(1):19–20. doi:10.1017/s1481803

OpenAI. ChatGPT [Internet]. 2025 [cited 2025 May 6] <https://chat.openai.com>

Published

2025-09-30

How to Cite

Kaewkungwal, J. (2025). The Grammar of Science: Do Clusters Really Matter?. Outbreak, Surveillance, Investigation & Response (OSIR) Journal, 18(3), 183–191. https://doi.org/10.59096/osir.v18i3.277904

Issue

Section

Invited article