Share this article on:

...BUT IS THE OUTCOME MEANINGFUL? JNPT'S Recommendations for Reporting Results of Controlled Trials

Harvey, Lisa A. PT, PhD

Journal of Neurologic Physical Therapy: September 2011 - Volume 35 - Issue 3 - p 103–104
doi: 10.1097/NPT.0b013e31822a2dde
Editor's Note

JNPT is working toward increasing the consistency in the way results of controlled trials are reported. This initiative intended is to help readers and clinicians interpret results and make decisions about the relative merits of different interventions. Controlled trials, as the name implies, are trials designed to compare the relative effectiveness of different interventions. Sometimes 2 or more interventions are compared, or alternatively, interventions are compared with no intervention or a sham intervention. Controlled trials allow conclusions to be drawn about cause-and-effect relationships.

The problem with any comparison is the potential for bias. Bias tends to overestimate treatment effectiveness and is best avoided through high-quality randomized controlled trials. The randomization process, along with other key methodological strategies, ensure that estimates of treatment effectiveness accurately reflect the added benefit of the intervention beyond changes that might have occurred due to other factors (such as expectations of subjects, researchers and assessors, natural recovery, or repeated exposure to the testing protocol). Of course, case-control studies and other forms of research also provide a way of comparing interventions and are clearly worthwhile, although these studies are more vulnerable to the problems of bias and conclusions about cause-and-effect relationships cannot be made.

The important results of any comparative study are the estimates of the size of the treatment effects; that is, the between-groups differences. These results alone convey the critical finding of a controlled trial and should not to be confused with within-groups differences (ie, pre to post difference of each group). Between-groups differences tell readers about the added benefit of one intervention over and above benefits from the contrast intervention or control condition. The typical way to report a between-groups difference for continuous data is as a mean between-groups difference (the equivalent for dichotomous data is an odds ratio or the like). The mean between-groups difference is an estimate of the size of the treatment effect. It is associated with uncertainty. This uncertainty is reflected in the 95% confidence interval (CI).

The 95% CI of a mean between-groups difference is highly relevant to decisions about the usefulness of interventions in clinical practice. It tells readers about the plausible range of mean effects if the intervention was to be applied in the real world. The interpretation of the 95% CI relies on defining a minimally worthwhile treatment effect; that is, how much difference does a treatment need to make to warrant the time, cost, and effort associated with its implementation. If the 95% CI is wholly above the minimally worthwhile treatment effect, then the intervention is clearly effective (assuming a higher score reflects a better outcome). The opposite holds true if the 95% CI is wholly below the minimally worthwhile treatment effect. In this situation, the intervention is clearly ineffective (again, assuming a higher score reflects a better outcome). If, however, the 95% CI crosses the minimally worthwhile treatment effect, then there is uncertainty. It is not clear whether the treatment is or is not clinically useful in comparison with the control or contrast condition.

Importantly, P values are only of passing relevance1,2 and should not be relied upon for making clinical decisions. This is because P values carry no information about the size of treatment effects. A between-groups comparison can be statistically significant, but the size of the treatment effect can be trivially small. Alternatively, and more commonly, a between-groups comparison can be statistically insignificant and erroneously interpreted as evidence of treatment ineffectiveness.3 In the later case, the insignificant results can be due to an inadequate sample size; a common problem in physical therapy research. The only way to be sure that a statistical finding reflects treatment effectiveness, and a nonstatistical finding reflects treatment ineffectiveness, is by a close examination of the 95% CI in relation to the minimally worthwhile treatment effect.

Of course, decisions about minimally worthwhile treatment effects are difficult because they vary from context to context. In some clinical scenarios, a patient may be satisfied with a very small treatment effect, especially if the treatment does not require a lot of time and effort, is not expensive, and is not associated with a risk of harm. However, in other clinical scenarios, a patient might only be satisfied with a sizable treatment effect. This is particularly likely to be the case if the treatment is costly and time consuming or is associated with a risk of harm. Regardless, it is not possible to interpret the results of controlled trials until a minimally worthwhile treatment effects is defined. This can be left to the reader but is best articulated by researchers in a public forum before the commencement of a study (ie, in the trial protocol or on a public trial registry). This strategy avoids the temptation of overstating potentially trivial effects on the basis of trial findings.

As JNPT moves forwards, it will encourage authors to report between-group differences in controlled trials. It will also strive toward increasing awareness of the importance of interpreting trial results with respect to minimally worthwhile treatment effects. These strategies will ensure that together we advocate and implement neurological physical therapy interventions on the basis of a robust interpretation of the evidence.

Back to Top | Article Outline


1. Goodman S. Toward evidence-based medical statistics. 1: the P value fallacy. Ann Intern Med. 1999;130:995–1004.
2. Matthews JN, Altman DG. Statistics notes. Interaction 2: compare effect sizes not P values. BMJ. 1996;313:808.
3. Altman DG, Bland JM. Absence of evidence is not evidence of absence. BMJ. 1995;311:485.
© 2011 Neurology Section, APTA