For many of us, writing is a large part of our professional roles. If you're like me, you find joy in identifying the “just right” word to make a point, and so the thesaurus (ok, digital thesaurus these days) is a dear and familiar companion. While, according to the thesaurus, “importance” and “significance” can often be used interchangeably, in research a finding may be statistically significant but not necessarily important. Conversely, the outcomes of a study may be important, while the P value does not reach the level of statistical significance. It may be surprising to learn that, in fact, statisticians themselves are concerned about the overinterpretation of statistically significant versus nonsignificant P values Earlier this year the American Statistical Association published a 19-page editorial and a special issue on this topic, calling for an end to use of the phrase “nonsignificant results” as the equivalent of “no effect” or “no difference.” In fact, there is a call for an end to the use of the phrase “statistically significant” altogether.1 The authors argue that overreliance on P values has led to the misleading conclusion that there are conflicts between studies when one study achieves the conventional threshold P value of 0.05 and a similar study does not. More generally, and more importantly, treating significant/nonsignificant as dichotomous outcomes has resulted in a distortion of the evidence. The recommendation is to provide the precise P value (eg, P = 0.13 as opposed to P > 0.05) to allow readers to gauge the relationship for themselves.
While the viewpoint that P values should be abandoned is certainly not universally shared among statisticians, and the idea has drawn criticism from a thought leader in evidence-based practice,2 imagine if we did take to heart the recommendation that we not depend on P values. What measures would we use instead for making causal inferences and for managing bias—what are the alternatives? There have long been discussions among the editors and editorial board members of the Journal of Neurologic Physical Therapy (JNPT) (and many other journals) about the difference between the concepts of clinical importance versus statistical significance. This brings attention to the need to consider, and specify, a priori what is and what is not a result that is of value for guiding practice. Back in 2011, considering the need to make a clear distinction between clinical importance versus statistical significance, the editors revised the JNPT Instructions for Authors to require reporting of measures of effect size, confidence intervals, and/or odds ratios that would allow readers to have some sense of clinical meaningfulness. However, like the P values, it is always important to bear in mind that all statistical measures are determined by the particular sample of study participants. For this reason, the larger the sample the more likely it is to represent the larger population of interest, and thereby improving the generalizability of the results.
Another valuable approach for identifying truly meaningful study results would be to redouble our efforts to identify the minimal clinically important difference (MCID) values that represent meaningful differences in our outcomes of interest. These MCID values can be either distribution-based (derived from the statistical distribution of the sample) or anchor-based (based on the amount of change in the outcome that patients/participants identify as meaningful). Both the distribution-based and anchor-based methods have numerous subtypes. It is important to bear in mind that the MCID values, like P values and measures of effect size, are subject to the particulars of study sample. While there is no easy answer for the question of how to decide if a statistically significant outcome is truly meaningful (or conversely, if a statistically nonsignificant outcome is in fact clinically meaningful), it is valuable to bear in mind that study findings are rarely “either/or.” Much depends on the specifics of the sample, the details of the intervention, the fidelity with which the protocol was applied, and innumerable other factors. With the growth in use of patient-reported outcome measures, there are many opportunities to increase our understanding of what is meaningful to our patients.