Multivariable regression modeling is a statistical method used for defining the relation between an outcome and a set of surrogate observations. The procedure is useful in identifying observations that provide independent information with respect to likelihood of the outcome in a particular data set. Failure to perform multivariable regression appropriately can lead to misleading inferences.1,2 Therefore, the need for reporting guidelines for complex multivariable regression models in clinical research is well recognized.1
Most physicians do not have adequate training in multivariable methods of statistical analysis. Despite this inadequacy, they are frequently faced with the results of such analyses in the medical literature, and the application of multivariable analytic methods in medical research has increased in recent years.1,2 Appraisal of the quality of these analytic methods has been limited to the general medical literature, and it shows a paucity of methodologic rigor.1,2 Similar deficiencies are likely to be present in the obstetrics and gynecology literature. Therefore, we generated guidelines for assessing multivariable models and investigated the quality of reporting by applying these guidelines to articles with multivariable logistic regression analyses in the obstetrics and gynecology literature.
Materials and Methods
The four main multivariable methods used in the biomedical literature are linear regression, logistic regression, discriminant function analysis, and proportional hazards (Cox) regression.1 The choice of method for any given study depends on the nature of the outcome (dependent) variable used for the model.1,3 Binary outcomes are ubiquitous in health science research, and logistic regression is used for analyzing them. The format of this type of analysis relates some predictor variables (X1, X2, …) to a binary outcome variable (Y) through a mathematical model:
In the logistic regression model, the term G is the log odds of the outcome variable Y; β0 is the intercept term; and β1, β2, … are the regression coefficients, indicating the impact of the independent variables X1, X2, … on the dependent variable Y. The coefficient is interpreted as the change in the outcome variable (change measured as log odds) associated with a one-unit change in the independent variable.
Logistic regression is the most frequently used approach to multivariable modeling in the medical literature.1 It has the advantage that its coefficients can be easily transformed into odds ratios, which is a commonly used measure of association in medical research. Because of these clinically useful properties, we assessed the quality of multivariable analysis in articles that used logistic regression. We assembled a database of articles with multivariable logistic regression analyses by a combination of manual and electronic searches of four general obstetrics and gynecology journals (Acta Obstetricia et Gynecologica Scandinavica, British Journal of Obstetrics and Gynaecology, American Journal of Obstetrics and Gynecology, and Obstetrics & Gynecology) for the years 1975, 1980, 1985, 1990, and 1995. One of the authors manually searched all of the issues of the above journals and years, supplementing it with an electronic MEDLINE search for the same journals and years. The electronic search was conducted using the search term “logistic model.” The electronic search identified two additional articles that were missed by the manual search.
The journals in this study were selected because they were available in our library and were indexed in the MEDLINE database. They were published in English, and three of them had the highest impact factor for general obstetrics and gynecology journals in the 1995 Journal Citation Reports.4 The 5-year intervals enabled us to show any trends over time. All relevant articles identified by our search were retrieved for analysis. For each article, we extracted information pertaining to the methodologic features highlighted below. In a subgroup of articles, data extraction was done in duplicate.
Several statistics textbooks provide guidelines for the conduct and interpretation of multivariable logistic regression analyses3,5–8; however, there are no criteria based on consensus among experts. Therefore, we generated quality criteria for the evaluation of multivariable logistic regression models (Table 1) based on published methodologic literature. Our criteria were divided into three parts: 1) Does the model appear to be correct? 2) How well does the model describe the data? and 3) If the overall model appears correct and works well, how important is each independent variable?
The issues covered under “Does the model appear to be correct?” were important because regression procedures are simply the means for analysis, not the end of an investigation. The final model depends on the selection of independent variables,1,5,8 which should be guided by the research question. It follows that any multivariable analysis that is not preceded by a clear research hypothesis should be viewed with skepticism. The specification of variables in multivariable models is not a simple task. Although variables can be chosen with automated algorithms, these are only a tool for exploring the data; clinical judgment, biologic sensibility, and previous research results are not considered in this process.
In each article, we sought a clear research hypothesis and and a description of the variable-selection process. Reporting of variable selection was considered adequate if it depended on consideration of the clinical or biologic importance, with forcing of such variables in the model if appropriate. The threshold of statistical significance for inclusion or deletion of variables in an automated algorithm (eg, stepwise forward or backward regression) had to be specified. In the absence of any information on the methods and criteria for selection, the variable-selection process was considered unreported.
An additional issue when assessing the form of the logistic regression model is the conformity of independent variables to linear gradient. This issue is pertinent to ranked independent variables (continuous or ordinal scale) because the value of the regression coefficient is assumed to be accurate as the average effect of the variable as it moves through different zones of measurement from low to high. If this condition is not met, the actual coefficient value may vary in different measurement zones, invalidating the estimated coefficient. In the articles with ranked independent variables, conformity to linear gradient was considered to be reported if an attempt was made to detect this problem (eg, by comparison of observed and predicted values for the outcome over the ordinal zones,6 or by an alternative analysis using cross-stratification9).
The second part of our quality checklist (“How well does the overall model work?”) focused on the goodness of fit, or the accuracy with which the final regression model described the data. This evaluation allowed us to determine whether knowing the values of all the independent variables collectively would predict the dependent variable any better than if we had no information on any of the independent variables. The validity of the results and conclusions from multivariable regression models rest on goodness of fit.1,6,8 Coefficients of logistic regression can be unreliable if there is overfitting or underfitting, and it is the relative paucity of outcome events that leads to these problems.10 If the ratio of outcome events to independent variables is less than 10:1, then the extent to which the independent variables, as a group, explain the dependent variable is of questionable accuracy.1,8 We searched in each article for a test assessing goodness of fit.
If the multivariable logistic regression model appears to be correct and it seems to describe the data well, then the clinical importance of the regression results can be assessed by using our criteria under “How important is each independent variable?” The contribution of each independent variable is initially evaluated by testing for its statistical significance, ie, its P value. The substantive significance of the independent variables is determined by examination of the magnitude of effect, ie, the change in dependent variable associated with a unit change in the independent variable.8 For example, a variable such as blood pressure (BP) measured in mmHg can be coded in 1-mmHg increments, 10-mmHg intervals, or dichotomously as less than 90 or greater than 90 mmHg. If the odds ratio for seizure (outcome variable) at different 10-mmHg categories of diastolic BP (independent variable) were found to be 1.20, then the odds of developing a seizure would be interpreted as increasing by 20% for each 10-mmHg rise in diastolic BP. This interpretation would be different if the coding of BP were different. Many reports do not include enough information on the coding of independent variables to allow this interpretation.1 We reviewed each article to see whether the authors reported the units of measurements for the variables and whether they were on an interval, ordinal, or binary scale. In the absence of such information, the coding of independent variables was classified as unreported.
Testing for interactions is important when the impact of one variable on the outcome is dependent on the level of another variable.1 Interactions between independent variables were considered reported if a statement in the article mentioned that they were suspected on clinical grounds and were evaluated. Interactions were classified as unreported if neither of the above was mentioned in the text.
Our main analysis was based on the assessment of compliance of the articles with our criteria for quality of multivariable logistic regression analyses. We compared compliance rates among the journals for each of the quality criteria using χ2 test. We also performed analyses to identify any trends in the publications and the quality of articles using logistic regression over time. The quality trends were evaluated separately for the three parts of our checklist. To assess the trends in reporting of criteria concerning “Did the model appear to be correct?” we classified articles as adequate if they reported at least one of the following three criteria concerning independent-variable selection: a clear research hypothesis, use of biologic sensibility, and description of the statistical method used for selection. For trends analysis of “How important was each of the independent variables?” we classified articles as adequate if they reported at least one of three criteria concerning interpretation of the substantive significance of independent variables: reporting of units, coding, and interaction of variables. Variations in publication and the quality of articles over time were analyzed statistically using χ2 test for trend.
Our search identified a total of 193 articles that used multivariable logistic regression analysis. There were no relevant articles in the years 1975 and 1980. Table 2 shows the distribution of the relevant articles identified in the years 1985, 1990, and 1995 according to the journal of publication. Overall, articles with logistic regression analysis as a proportion of all articles increased over time: 1.7% (26 of 1570) in 1985, 2.8% (53 of 1915) in 1990, and 6.6% (114 of 1728) in 1995 (P < .001 for trend). This trend was apparent in all journals except the British Journal of Obstetrics and Gynaecology.
Table 3 shows the percentage of articles that complied with our quality criteria for reporting of multivariable logistic regression models. When assessing the degree to which the models appeared to be correct, it was apparent that 62 of 193 articles (32.1%) did not state the research questions clearly in terms of dependent and independent variables. In 91 of 193 articles (47.1%), biologic and scientific reasons for choosing the independent variables were not reported. The process of variable selection was inadequately described in 100 of 193 articles (51.8%). Among these articles, automated algorithms were reported in 59, but the threshold of statistical significance was not reported in 39 of these (66.1%). There were 93 articles with ranked independent variables, and 79 of these (84.9%) did not report assessment of conformity to linear gradient.
The reporting of the accuracy with which the logistic models described the data was very poor; tests for goodness of fit were not done in 180 of 193 articles (93.2%). The contribution of the independent variables could not be evaluated in a large number of articles because units of measurement of the variables were not reported in 61 of 193 (31.6%). When the measurement units were available, nine articles did not report the coding of variables, leading to problems with interpretation of the regression coefficients. Tests of interaction were not reported or considered in 167 of 193 articles (86.5%). The comparison of compliance rates between journals did not show any significant differences for any of the quality criteria.
Our analysis of changes in methodologic rigor over time showed that the proportion of articles that reported at least one quality criterion concerning independent-variable selection was 61.5% (16 of 26) in 1985, 71.7% (38 of 53) in 1990, and 70.2% (80 of 114) in 1995 (P = .52 for trend). Despite a lack of an overall trend, Obstetrics & Gynecology showed a significant trend toward improvement in reporting of this quality feature (Table 4). Percentages of articles reporting goodness of fit did not change over time: 7.7% (two of 26) in 1985, 9.4% (five of 53) in 1990, and 5.3% (six of 114) in 1995 (P = .59 for trend). For reporting of at least one quality criterion concerning interpretation of the substantive significance of independent variables, there was a significant time trend: 42.3% (11 of 26) in 1985, 73.6% (39 of 53) in 1990, and 75.4% (86 of 114) in 1995 (P = .004 for trend). This overall trend was supported only by the American Journal of Obstetrics and Gynecology and Obstetrics & Gynecology (Table 5).
Our study showed that the quality of reporting of logistic regression models in the obstetrics and gynecology literature is not rigorous. To appreciate the implications of our findings, one must understand why multivariable models are important in medical research. They are required for adequate analysis of observational data and, if conducted without robust methodology, they have the potential for misleading inferences.
The observational study design is the most common publication type. For example, in 1996, Obstetrics & Gynecology published 241 articles using observational methodology, which represented 76% of the total that year.11 Validity of the results from observational studies depends on the degree of confounding,12–16 which arises because of differences between subjects in the study groups that are separately related to the outcome. Therefore, investigators must attempt to remove the effect of confounding before assessing statistical significance.12 This control for confounding is usually done by multivariable analysis.5 Methodologic inadequacies in multivariable analyses can result in bias and imprecision. Bias refers to the existence of a systematic tendency for the estimated regression coefficients to be too high or too low compared with the true values of the coefficient. Imprecision refers to the tendency for the coefficients to have large standard errors (and confidence intervals), which makes it difficult to reject the null hypothesis even when it is false.
Focusing on validity of results, the randomized controlled trial is believed to be the most methodologically robust study design.17 However, there are many instances in health care when the randomized experimental design is not practical. For example, when evaluating the association of cigarette smoking with lung cancer16,18 or that of breast-feeding with infection,15,19 the observational design is more feasible. The primary aim of randomization is to exclude the effects of confounding factors,12,20 which are expected to be equally distributed in the randomized groups, leaving the intervention or exposure under study as the only disparity. Sometimes, randomization produces unbalanced groups, which requires multivariable analysis for adjustment of the imbalances.
Factors that invalidate multivariable logistic regression analyses were included in our methodologic criteria for quality (Table 1). In some of our criteria when addressing whether the models appeared to be correct or incorrect, there was room for leniency in rigor. This was permissible because the variable-selection process is an art combining biologic and statistical sensibility, leading to some subjectivity in assessing the compliance of the articles with the quality criteria. However, if the multivariable model is incorrectly specified, irrelevant variables may be included or relevant variables may be omitted. Inclusion of irrelevant variables increases the standard error of the regression coefficients, reducing precision.5,8 Omission of relevant variables, on the other hand, results in biased coefficients.5,8 We found that the variable-selection process was unspecified in 51.8% of the articles. In the general medical literature, however, this feature was found to be inadequate in only 14%.1 This finding suggests that investigators in general medicine may be more aware of the impact of the variable-selection mechanism on the results of multivariable analyses than researchers in obstetrics and gynecology. It was discouraging to note that the trend analysis for reporting of independent-variable selection did not show a significant improvement over time for all of the journals pooled together. Only Obstetrics & Gynecology showed increasingly better reporting rates over time (Table 4).
The validity of the logistic regression results also depends on meeting certain assumptions, such as conformity to linear gradient with ranked independent variables1 (ordinal or continuous) and checking for possible overfitting21 and interactions between the independent variables.5 Multivariable analyses in The Lancet and New England Journal of Medicine (1985–1989) were reported to have risks of overfitting in 42% of articles.1 In these journals, there was also a lack of testing for interactions in 73% of the articles,1 a figure not too different from the 86.4% of articles reported without interactions in our study. Another related methodologic issue is the mathematical fit of the model, ie, how effectively the calculated model fits the actual data for estimating outcome variables.1,6,10 Similar to our finding of a lack of goodness-of-fit tests in 93.2% of articles, The Lancet, New England Journal of Medicine, British Medical Journal, and Journal of the American Medical Association also failed to report such tests in 93.5% of their articles that used logistic regression analyses from 1991 to 1994 (Bender R, Grouven U. Logistic regression models used in medical research are poorly reported [letter]. BMJ 1996;313:628). These comparisons suggest that poor reporting of methods of multivariable models is not limited to the obstetrics and gynecology literature. However, reporting of at least one quality criterion concerning interpretation of the substantive significance of independent variables seems to be improving over time in our specialty, a trend that was individually apparent in American Journal of Obstetrics and Gynecology and Obstetrics & Gynecology (Table 5).
The findings of our study may be viewed with skepticism by critics who would argue that the lack of reporting does not necessarily mean that validation procedures and assumption testing were not used in the data analyses. It is plausible that the investigators conducted their analyses rigorously without reporting all of the elements contained in our quality checklist because there has not been a clear standard for reporting multivariable models in the medical literature. Our study may be seen as having biased the existing articles against compliance with our quality criteria. In addition, because of space constraints in medical journals, some of the material dealing with methods of analysis may have been deleted between manuscript submission and publication. Without information on analytic methods, however, it is impossible to make confident inferences about the validity of logistic regression results. Hence there is a need for improvement in the conduct and reporting of multivariable analyses in the medical literature.
1. Concato J, Feinstein AR, Holford TR. The risk of determining risk with multivariable models. Ann Intern Med 1993;118:201–10.
2. Katz MH, Hauck WW. Proportional hazards (Cox) regression. J Gen Intern Med 1993;8:702–11.
3. Hirsch RP, Riegelman RK. Statistical first aid. An interpretation of health research data. Boston: Blackwell Scientific Publications, 1992.
4. 1995 Journal citation reports (JCR). Philadelphia: Institute for Scientific Information, Inc, 1996.
5. Kleinbaum DG, Kupper LL, Muller KE. Applied regression analysis and other multivariable methods. Boston: PWS-Kent Publishing Co, 1988.
6. Hosmer DW, Lemeshow S. Applied logistic regression. New York: Wiley, 1989.
7. Armitage P, Berry G. Statistical methods in medical research. 3rd ed. London: Blackwell Scientific, 1994.
8. Menrad S. Applied logistic regression analysis. Sage University paper series on quantitative applications in social sciences, 07-106. Thousand Oaks, California: Sage Publications, 1995.
9. Feinstein AR. Prognostic stratification. In: Feinstein AR, ed. Clinical biostatistics. St. Louis: CV Mosby, 1977:385–443.
10. Hosmer DW, Taber S, Lemeshow S. The importance of assessing the fit of logistic regression models: A case study. Am J Public Health 1991;81:1630–5.
11. Funai EF. Obstetrics & gynecology in 1996: Marking the progress toward evidence-based medicine by classifying studies based on methodology. Obstet Gynecol 1997;90:1020–2.
12. Brennan P, Croft P. Interpreting the results of observational research: Chance is not such a fine thing. BMJ 1994;309:727–30.
13. Goldberg RJ, Pastides H, Ellison RC, Tuthill RW, Dewitt T. Uses of the case-control and cohort epidemiological approaches in pediatric practice and research. Pediatr Res 1985;19:787–90.
14. Bracken MB. Reporting observational studies. Br J Obstet Gynaecol 1989;96:383–8.
15. Bauchner H, Leventhal JM, Shapiro ED. Studies of breast-feeding and infections. How good is the evidence? JAMA 1986;256:887–92.
16. Smith GD, Shipley MJ. Confounding of occupation and smoking: Its magnitude and consequences. Soc Sci Med 1991;32:1297–300.
17. Sibbald B, Roland M. Why are randomised controlled trials important? BMJ 1998;316:201.
18. Loeb LA, Ernster VL, Warner KE, Abbotts J, Laszlo J. Smoking and lung cancer: An overview. Cancer Res 1984;44:5940–58.
19. Jason JM, Nieburg P, Marks JS. Mortality and infectious disease associated with infant-feeding practices in developing countries. Pediatrics 1984;74:702–27.
20. Treasure T, MacRae KD. Minimisation: The platinum standard for trials? BMJ 1998;317:362–3.
21. Harrell FE Jr, Lee KL, Matchar DB, Reichert TA. Regression models for prognostic prediction: Advantages, problems, and suggested solutions. Cancer Treat Rep 1985;69:1071–7.