Secondary Logo

Institutional members access full text with Ovid®

Tutorial on Data Balancing

Application to Benchmarking Clinicians

Alemi, Roshan, MA; Elrafey, Amr, MA; Neuhauser, Duncan, PhD; Alemi, Farrokh, PhD

Quality Management in Healthcare: January/March 2019 - Volume 28 - Issue 1 - p 1–7
doi: 10.1097/QMH.0000000000000203
Original Articles

In this tutorial, we show how data balancing, in general, and stratified covariate balancing, in particular, can be used to benchmark clinicians. This tutorial aims to explain the concepts behind data balancing to readers who do not have a strong statistical background. Data balancing enables the analyst to compare the performance of clinicians with their peer groups on the same set of patients. The comparison is done in 3 steps. First, the patients are described in terms of their conditions/comorbidities. Each combination of comorbidities is treated as a separate type of patient. Second, the analyst measures the frequency of observing different types of patients. Third, expected outcomes are calculated for both the clinician and the peer group. The expected outcome for the clinician is calculated as the sum of product of 2 terms: probability of and the average outcome for different types of patients. The expected outcome for the peer group is calculated in the same way, with one difference: the distribution of peer group's patients is switched with the distribution of the clinician's patients. This allows us to simulate the performance of peer group on the clinician's patients. This switch in frequencies accomplishes the same goal as using propensity weights, or covariate balancing weights, but it avoids the cumbersome need to estimate the weights. In switching the distributions, a problem arises when the peer group does not see the same type of patients as the clinician. When the peer group's outcome for some patient types is missing, a synthetic case is organized. These synthetic cases are constructed from the peer group's experience with 2 complementary parts of the missing case. The reliance on synthetic cases allows one to have a match for every type of clinician's patients. Together, the synthetic case and the switch of distribution allow one to simulate the performance of the clinician and the peer group on same set of cases. The tutorial walks the reader through examples. The procedures described here can be applied to data in electronic health records. We present Standard Query Language for doing so.

The Advisory Board, Washington, DC (Ms Alemi); Health Informatics Program, George Mason University, Fairfax, Virginia (Mr Elrafey and Dr Alemi); and Case Western Reserve University, Cleveland, Ohio (Dr Neuhauser).

Correspondence: Farrokh Alemi, PhD, Health Informatics Program, George Mason University, 4400 University Dr, Fairfax, VA 22030 (

The authors declare no conflicts of interest.

Supplemental digital content is available for this article. Direct URL citation appears in the printed text and is provided in the HTML and PDF versions of this article on the journal's Web site (

© 2019Wolters Kluwer Health | Lippincott Williams & Wilkins