Reliability and Agreement of Urodynamics Interpretations in a Female Pelvic Medicine Center

Whiteside, James L. MD1; Hijaz, Adonis MD1; Imrey, Peter B. PhD1; Barber, Matthew D. MD1; Paraiso, Marie F. MD1; Rackley, Raymond R. MD1; Vasavada, Sandip P. MD1; Walters, Mark D. MD1; Daneshgari, Firouz MD1

doi: 10.1097/01.AOG.0000227778.77189.2d
Original Research

OBJECTIVE: To estimate the reliability and interobserver consistency of urodynamic interpretations of female bladder and urethral function.

METHODS: Three urogynecologists and three female urologists at a tertiary care medical center reviewed masked, abstracted clinical and urodynamic information from 100 charts, selected for adequate completeness from a consecutive series of 135 women referred for urodynamic testing. For each of the 100 cases, the reviewers assigned International Continence Society filling and voiding phase diagnoses, and overall clinical diagnoses. Raw agreement proportions and weighted kappa chance-corrected agreement statistics (κ) were used jointly to describe both reliability and interobserver agreement. Reliability was estimated from duplicate reviews, masked and separated by at least 4 months, of each case by each physician. Interobserver agreement was estimated from comparisons of all pairs of responses from different physicians.

RESULTS: For clinical diagnosis of stress incontinence (present, absent, indeterminate), the within- and across-physician weighted κ's were, respectively, 0.78 and 0.68. Corresponding results were 0.40 and 0.13 for detrusor overactivity without incontinence, 0.58 and 0.38 for detrusor overactivity with incontinence, and 0.51 and 0.26 for voiding dysfunction. Standard errors of each κ were between 0.023 and 0.043.

CONCLUSION: In our group, lower urinary tract diagnoses of stress urinary incontinence from both clinical and urodynamic data demonstrated substantial reliability and interobserver agreement. However, by conventional interpretation of κ-statistics, reliability of diagnoses of detrusor overactivity or voiding dysfunction was only moderate, and interobserver agreement on these diagnoses was no better than fair. Urodynamic interpretations may not be satisfactorily reproducible for these diagnoses.


Diagnoses of stress incontinence from urodynamic studies seem stable within and across physicians, but urodynamics-based detrusor overactivity or voiding dysfunction diagnoses may be insufficiently reliable.

From the1 Center for Female Pelvic Medicine and Department of Quantitative Health Sciences, Cleveland Clinic Foundation, Cleveland, Ohio.

Supported by NIH grant K08 DK02631 and a Clinician Investigator Award of the Cleveland Clinic Foundation to Firouz Daneshgari, MD.

The authors thank Gerald A. Roberson for data management and quality control support.

Presented at the meeting of the American Urogynecological Society, Atlanta, Georgia, September 15–17, 2005.

Corresponding author: James L. Whiteside, MD, Dartmouth-Hitchcock Medical Center, One Medical Center Drive, Lebanon NH 03756; e-mail:

© 2006 The American College of Obstetricians and Gynecologists