Institutional members access full text with Ovid®

Share this article on:

Impact of Cross-Calibration Methods on the Interpretation of a Treatment Comparison Study Using 2 Depression Scales

Fischer, Herbert Felix MSc*,†; Wahl, Inka MSc; Fliege, Herbert PhD; Klapp, Burghard F. MD, PhD; Rose, Matthias MD, PhD

doi: 10.1097/MLR.0b013e31822945b4
Original Articles

Background: Many questionnaires assessing depressive symptoms are available. Most of these questionnaires are constructed based on classical test theory, making comparisons of individual scores difficult. Item response theory (IRT) allows the comparison of scores from different instruments. In this study, the impact of IRT-based cross-calibration methods on the results of a treatment outcome study was evaluated using 2 instruments.

Methods: Data collected during admission and discharge procedures from 1066 inpatients in 2 psychosomatic clinics using different depression measures were analyzed. To achieve comparability across the applied depression measures, we used an IRT-based conversion table to transform scores from one instrument’s scale to the other. Latent trait values were also estimated using different instruments in each clinic. We compared these methods to the traditional approach of using the same instrument in both clinics and examined their effects on the statistical analyses.

Results: There was no substantial change in the interpretation of the study results when different instruments were used. However, F values, P values, and effect sizes in the analysis of variance changed significantly. This might be attributed to differences in the content or measurement properties of the instruments. Interestingly, no difference was observed between use of transformed sum scores and latent trait values.

Conclusions: IRT cross-calibration methods are a convenient way to enhance the comparability of questionnaire data in applied clinical settings but seem not to be able to overcome differences in measurement properties of the instruments. As these differences can lead to biased results, there is a need for further research into more advanced techniques.

*Institute for Social Medicine, Epidemiology and Health Economics

Department of Psychosomatic Medicine and Psychotherapy, Clinic for Internal Medicine, Charité University Medical Center, Luisenstr, Berlin

Department of Psychosomatic Medicine and Psychotherapy, University Medical Center Hamburg-Eppendorf and Schön Klinik Hamburg-Eilbek, Martinistr, Hamburg, Germany

The authors declare no conflict of interest.

Reprints: Herbert Felix Fischer, MSc, Institute for Social Medicine, Epidemiology and Health Economics, Charite University Medical Center, Luisenstr, 57, 10117 Berlin, Germany. E-mail:

© 2012 Lippincott Williams & Wilkins, Inc.