Test-retest design to examine interrater reliability.
Examine the interrater reliability of individual examination items and a classification decision-making algorithm using physical therapists with varying levels of experience.
Classifying patients based on clusters of examination findings has shown promise for improving outcomes. Examining the reliability of examination items and the classification decision-making algorithm may improve the reproducibility of classification methods.
Patients with low back pain less than 90 days in duration participating in a randomized trial were examined on separate days by different examiners. Interrater reliability of individual examination items important for classification was examined in clinically stable patients using kappa coefficients and intraclass correlation coefficients. The findings from the first examination were used to classify each patient using the decision-making algorithm by clinicians with varying amounts of experience. The reliability of the classification algorithm was examined with kappa coefficients.
A total of 123 patients participated (mean age 37.7 [±10.7] years, 44% female), 60 (49%) remained stable between examinations. Reliability of range of motion, centralization/peripheralization judgments with flexion and extension, and the instability test were moderate to excellent. Reliability of centralization/peripheralization judgments with repeated or sustained extension or aberrant movement judgments were fair to poor. Overall agreement on classification decisions was 76% (kappa = 0.60, 95% confidence interval 0.56, 0.64), with no significant differences based on level of experience.
Reliability of the classification algorithm was good. Further research is needed to identify sources of disagreements and improve reproducibility.
Subgrouping patients with low back pain can improve outcomes of conservative care. Improving the reliability of subgrouping methods may improve their effectiveness. This study examined the interrater reliability of both individual items and overall decision making of a previously described classification method. Most items and the overall decision making had at least moderate reliability, regardless of the experience of the examiner.
From the *Division of Physical Therapy, University of Utah, Intermountain Health Care, Salt Lake City, UT; †Intermountain Health Care, Salt Lake City, UT; and ‡Centers for Rehab Services, University of Pittsburgh Medical Center, Pittsburgh, PA.
Acknowledgment date: October 20, 2004; First revision date: November 30, 2004; Second revision date: January 11, 2005; Acceptance date: February 7, 2005.
The manuscript submitted does not contain information about medical device(s)/drug(s).
Foundation funds (Research Grant from the Deseret Foundation) were received in support of this work. No benefits in any form have been or will be received from a commercial party related directly or indirectly to the subject of this manuscript.
Address correspondence and reprint requests to Julie M. Fritz, PhD, PT, ATC, 520 Wakara Way, Salt Lake City, UT 84108; E-mail: firstname.lastname@example.org