Secondary Logo

Journal Logo

Institutional members access full text with Ovid®

Interobserver variability in data collection of the APACHE II score in teaching and community hospitals

Chen, Liddy M. MB, MSc; Martin, Claudio M. MSc, MD, FRCPC; Morrison, Teresa L. RN; Sibbald, William J. MD, FCCM, FRCPC

Special Articles

Objectives: To examine interobserver reliability of the Acute Physiologic and Chronic Health Evaluation (APACHE) II score and identify major causes of variability in data collection.

Design: Descriptive, comparative analysis.

Setting: Nine intensive care units in two teaching and six community hospitals

Subjects: A random sample of 342 patient records selected from a network database.

Intervention: None.

Measurements and Main Results: Data were reabstracted and compared with the original records. Individual physiologic points derived from the APACHE II scoring system (instead of the actual physiologic values) were compared using the kappa statistic. Paired measurements of the continuous variables were compared using the interclass correlation coefficient and Bland-Altman plots. Excellent agreement was found in most demographic, admission, and discharge data. The system failure requiring intensive care unit admission was consistently identified by both data collectors in 88% of cases, but only 66% agreed on the exact admitting diagnosis. For APACHE II score components, the kappa statistic ranged from 0.315 for the Glasgow Coma Scale point to 0.976 for the age point. Significant disagreement regarding the probability of death derived from the APACHE II model was evident in some patient records. Overall agreement among groups of patients regarding the APACHE II score was good, however, with no significant difference in the mean score (20.2 vs. 20.1; p = .758). The predicted mortality from the reabstracted data was 30%, similar to the 27% predicted mortality from the original data (p = .380).

Conclusion: Reliability of data collection varied widely in different components of the APACHE II probability-of-death model. Significant discrepancies in some components suggested a lack of explicit definitions and timing for consistent data collection between institutions or between data collectors. Nonetheless, variability resulting from data collection appears to be randomly distributed, so that comparisons of group means are valid.

From the Critical Care Research Network, London Health Sciences Centre (Drs. Chen, Martin, Morrison, and Sibbald) and the Department of Medicine, University of Western Ontario (Drs. Martin and Sibbald), London, Ontario, Canada.

Supported, in part, by The Critical Care Research Network; The Richard Ivey Critical Care Trauma Centre, London Health Sciences Centre, Victoria Campus; Institute for Clinical Evaluation Sciences in Ontario; and Victoria Hospital Research Institute, London Health Sciences Centre.

Address requests for reprints to: Claudio Martin, MD, Critical Care Research Network, London Health Sciences Centre, Victoria Campus, 375 South Street, London, Ontario, Canada N6A 4G5.

© 1999 Lippincott Williams & Wilkins, Inc.