- The assumptions underlying the use of statistics are considered, given the data collected.
- The statistics are reported correctly and appropriately.
- The number of analyses is appropriate.
- Measures of functional significance, such as effect size or proportion of variance accounted for, accompany hypothesis-testing analyses.
ISSUES AND EXAMPLES RELATED TO THE CRITERIA
Even if the planned statistical analyses as reported in the Method section are plausible and appropriate, it is sometimes the case that the implementation of the statistical analysis as reported in the Results section is not. Several issues may have arisen in performing the analyses that render them inappropriate as reported in the Results section. Perhaps the most obvious is the fact that the data may not have many of the properties that were anticipated when the data analysis was planned. For example, although a correlation between two variables was planned, the data from one or the other (or both) of the variables may demonstrate a restriction of range that invalidates the use of a correlation. When a strong restriction of range exists, the correlation is bound to be low, not because the two variables are unrelated, but because the range of variation in the particular data set does not allow for the expression of the relationship in the correlation. Similarly, it may be the case that a t-test was planned to compare the means of two groups, but on review of the data, there is a bimodal distribution that raises doubts about the use of a mean and standard deviation to describe the data set. If so, the use of a t-test to evaluate the differences between the two groups becomes inappropriate. The reviewer should be alert to these potential problems and ensure, to the extent possible, that the data as collected continue to be amenable to the statistics that were originally intended. Often this is difficult because the data necessary to make this assessment are not presented. It is often necessary simply to assume, for example, that the sample distributions were roughly normal, since the only descriptive statistics presented are the mean and standard deviation. When the opportunity does present itself, however, the reviewer should evaluate the extent to which the data collected for the particular study satisfy the assumptions of the statistical tests that are presented in the Results section.
Another concern that reviewers should be alert to is the possibility that while appropriate analyses have been selected and performed, they have been performed poorly or inappropriately. Often enough data are presented to determine that the results of the analysis are implausible given the descriptive statistics, that “the numbers just don't add up.” Alternatively, it may be the case that data and analyses are insufficiently reported for the reviewer to determine the accuracy or legitimacy of the analyses. Either of these situations is a problem and should be addressed in the review.
A third potential concern in the reporting of statistics is the presence in the Results section of analyses that were not anticipated in the Method section. In practice, the results of an analysis or a review of the data often lead to other obvious questions, which in turn lead to other obvious analyses that may not have been anticipated. This type of expansion of analyses is not necessarily inappropriate, but the reviewer must determine whether it has been done with control and reflection. If the reviewer perceives an uncontrolled proliferation of analyses or if the new analyses appear without proper introduction or explanation, then a concern should be raised. It may appear to the reviewer that the author has fallen into a trap of chasing an incidental finding too far, or has enacted an unreflective or unsystematic set of analysis to “look for anything that is significant.” Either of these possibilities implies the use of inferential statistics for purposes beyond strict hypothesis testing and therefore stretches the statistics beyond their intended use.
On a similar note, reviewers should be mindful that as the number of statistical tests increases, the likelihood that at least one of the analyses will be “statistically significant” by chance alone also increases. When analyses proliferate it is important for the reviewer to determine whether the significance levels (p-values) have been appropriately adjusted to reflect the need to be more conservative.
Finally, it is important to note that statistical significance does not necessarily imply practical significance. Tests of statistical significance tell an investigator the probability that chance alone is responsible for study outcomes. But inferential statistical tests, whether significant or not, do not reveal the strength of association among research variables or the effect size. Strength of association is gauged by indexes of the proportion of variance in the dependent variable that is “explained” or “accounted for” by the independent variables in an analysis. Common indexes of explained variation are eta2 (η2) in ANOVA and R2 (coefficient of determination) in correlational analyses. Reviewers must be alert to the fact that statistically significant research results tell only part of the story. If a result is statistically significant, but the independent variable accounts for only a very small proportion of the variance in the dependent variable, the result may not be sufficiently interesting to warrant extensive attention in the Discussion section. If none of the independent variables accounts for a reasonable proportion of the variance, then the study may not warrant publication.
Begg C, Cho M, Eastwood S., et al. Improving the quality of reporting of randomized controlled trials: the CONSORT statement. JAMA. 1996; 276:637–9.
Cohen J. The earth is round (p
< .05). Am Psychol. 1994;49:997–1003.
Dawson B, Trapp RG. Basic and Clinical Biostatistics. 3rd ed. New York: Lange Medical Books/McGraw-Hill, 2001.
Hays WL. Statistics. New York: Holt, Rinehart and Winston, 1988.
Hopkins KD, Glass GV. Statistical Methods in Education and Psychology. Boston, MA: Allyn & Bacon, 1995.
Howell DC. Statistical Methods for Psychology. 4th ed. Belmont, CA: Wadsworth, 1997.
Lang TA, Secic M. How to Report Statistics in Medicine. Philadelphia, PA: College of Physicians, 1997.
Meehl PE. Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress of soft psychology. J Consult Clin Psychol. 1978;46: 806–34.
Moher D, Cook DJ, Eastwood S, Olkin I, Rennie D, Stroup DF. Improving the quality of reports of meta-analyses of randomised controlled trials: the QUOROM statement. Quality of Reporting of Meta-analyses. Lancet. 1999. 354:1896–900.
Norman GR, Striner DL. Biostatistics: The Bare Essentials. St. Louis, MO: Mosby, 1994 [out of print].
Norusis MJ. SPSS 9.0 Guide to Data Analysis. Upper Saddle River, NJ: Prentice—Hall, 1999.
Rennie D. CONSORT revised—improving the reporting of randomized trials. JAMA. 2001;285:2006–7.
Stroup DF, Berlin JA, Morton SC, et al. Meta-analysis of observational studies in epidemiology: a proposal for reporting. Meta-analysis Of Observational Studies in Epidemiology (MOOSE) group. JAMA. 2000;283:2008–12.
Review Criteria for Research Manuscripts
Joint Task Force of Academic Medicine and the GEA-RIME Committee