Secondary Logo

Journal Logo

Statistical Inefficiencies in the Development of a Prediction Model

Collins, Gary S. PhD; Le Manach, Yannick PhD

doi: 10.1213/ANE.0000000000001838
Letters to the Editor: Letter to the Editor

Centre for Statistics in Medicine, Botnar Research Centre, University of Oxford, Oxford, United Kingdom,

Departments of Anesthesia & Clinical Epidemiology and Biostatistics, Michael DeGroote School of Medicine, Faculty of Health Sciences, McMaster University and the Perioperative Research Group, Population Health Research Institute, Hamilton, Canada

Back to Top | Article Outline

To the Editor

The recent article by Zhang et al1 exhibits a number of statistical inefficiencies and flaws during their development of a clinical prediction model for postcardiac atrial fibrillation in the Asian population. The premise of their study is that existing models are focused on Western population and not necessarily valid in the Asian population. They then develop another prediction model focused on the Asian population. Our first point is that far too many prediction models are being developed when existing models can be used. Our question to Dr. Zhang and his colleagues is why did they not evaluate existing models to see whether they showed any promise, and possibly recalibrate or update these existing models?

Our next point relates to study design. The sample size calculations that the authors have presented bear no resemblance to recommended approaches. Model development should be constrained to the rule-of-thumb of 10 events-per-variable and model validation should comprise a minimum of 100 outcome events.2

The authors selected variables for inclusion into the model based on their univariate association with the outcome. This approach has been shown to be flawed as important predictors can be overlooked due to idiosyncrasies in the data. The authors also dichotomized continuous variables, a practice long regarded as being unnecessary and a waste because it throws away prognostic information.3 Even if an easy-to-use model is required, it is preferable to retain all continuous predictors and simplify the model afterward to retain as much predictive ability as possible.4

Model validation, an important part of model development, was performed on a separate cohort collected from the same institutions during the same time period; this test of model performance is weak and nothing but a variation of random split-validation, a practice that has been shown to be unnecessary and inefficient.2 Both the development and validation data sets are random samples drawn from the same population (time and place), and the performance of the model (area under the receiver operating characteristic curve) in the validation data set will be expected to be similar, except when there are chance differences in patient characteristics between the 2 samples. A more efficient approach is to use data from 1 time period to develop the model and use data from a separate time period to evaluate the performance of the model, or to use data from 1 center to develop the model and use data from the other center to validate the model. Subsequent validations should also seek to evaluate the model in different centers.

The authors evaluated calibration by calculating the Hosmer-Lemeshow test, widely known to be affected by sample size but more importantly to be uninformative in that it provides no indication of magnitude or direction of (mis)calibration. In accordance with recent recommendations on the reporting of prediction model studies, calibration should be assessed graphically by plotting predicted outcome probabilities (x-axis) against observed outcomes (y-axis) using a high-resolution smoothed (loess) line.2 It is important to also highlight that the performance characteristics reported are of the logistic regression (which is never published in full, the intercept is missing) and not the simplified scoring system that the authors are promoting to be used. Therefore, model performance will be substantially worse than reported in the article.

Our final point relates to the presentation of the simplified model. Although a simplified model may increase model uptake, it is important to indicate what a score of 1, 2, …, 9 relates to in terms of predicted risk. Creating arbitrary risk groups, low, moderate, and high, without quantifying what this actually means is uninformative. For example, it would be useful to indicate that low risk relates to predicted risk between x% and y%.

Gary S. Collins, PhDCentre for Statistics in MedicineBotnar Research CentreUniversity of OxfordOxford, United

Yannick Le Manach, PhDDepartments of Anesthesia & Clinical Epidemiologyand BiostatisticsMichael DeGroote School of MedicineFaculty of Health SciencesMcMaster University and the PerioperativeResearch GroupPopulation Health Research InstituteHamilton, Canada

Back to Top | Article Outline


1. Zhang W, Liu W, Chew ST, Shen L, Ti LK. A clinical prediction model for postcardiac surgery atrial fibrillation in an Asian population. Anesth Analg. 2016;123:283–289.
2. Moons KG, Altman DG, Reitsma JB, et al. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): explanation and elaboration. Ann Intern Med. 2015;162:W1–W73.
3. Collins GS, Ogundimu EO, Cook JA, Manach YL, Altman DG. Quantifying the impact of different approaches for handling continuous predictors on the performance of a prognostic model. Stat Med. 2016;35:4124–4135.
4. Sullivan LM, Massaro JM, D’Agostino RB Sr.. Presentation of multivariate data for clinical use: the Framingham Study risk score functions. Stat Med. 2004;23:1631–1660.
Copyright © 2016 International Anesthesia Research Society