Peer review is used to make final judgments about quality of care in many quality assurance activities. To overcome the low reliability of peer review, discussion between several reviewers is often recommended to point out overlooked information or allow for reconsideration of opinions and thus improve reliability. The authors assessed the impact of discussion between 2 reviewers on the reliability of peer review.
A group of 13 board-certified physicians completed a total of 741 structured implicit record reviews of 95 records for patients who experienced severe adverse events related to laboratory abnormalities while in the hospital (hypokalemia, hyperkalemia, renal failure, hyponatremia, and digoxin toxicity). They independently assessed the degree to which each adverse event was caused by medical care and the quality of the care leading up to the adverse event. Working in pairs, they then discussed differences of opinion, clarified factual discrepancies, and rerated the record. The authors compared the reliability of each measure before and after discussion, and between and within pairs of reviewers, using the intraclass correlation coefficient for continuous ratings and the kappa statistic for a dichotomized rating.
The assessment of whether the laboratory abnormality was iatrogenic had a reliability of 0.46 before discussion and 0.71 after discussion between paired reviewers, indicating considerably improved agreement between the members of a pair. However, across reviewer pairs, the reviewer reliability was 0.36 before discussion and 0.40 after discussion. Similarly, for the rating of overall quality of care, reliability of physician review went from 0.35 before discussion to 0.58 after discussion as assessed by pair. However, across pairs the reliability increased only from 0.14 to 0.17. Even for prediscussion ratings, reliability was substantially higher between 2 members of a pair than across pairs, suggesting that reviewers who work in pairs learn to be more consistent with each other even before discussion, but this consistency also did not improve overall reliability across pairs.
When 2 physicians discuss a record that they are reviewing, it substantially improves the agreement between those 2 physicians. However, this improvement is illusory, as discussion does not improve the overall reliability as assessed by examining the reliability between physicians who were part of different discussions. This finding may also have implications with regard to how disagreements are resolved on consensus panels, guideline committees, and reviews of literature quality for meta-analyses.
*From the Veterans Affairs Center for Practice Management and Outcomes Research, Ann Arbor, Michigan.
†From the Department of Internal Medicine, University of Michigan Medical School, Ann Arbor, Michigan.
Supported by VA HSR&D Merit Review Grant 94-131. Dr. Hofer is supported by a Career Development Grant from the Health Services Research and Development Office of the Department of Veterans Affairs.
Presented in part at the Society of General Internal Medicine meeting, April 1999.
Address correspondence to: Timothy P. Hofer, MD, Ann Arbor VA HSR&D, P.O. Box 130170, Ann Arbor, MI 48113. E-mail: email@example.com
Received April 19, 1999; initial review completed June 28, 1999; accepted August 18, 1999.