Background: Individual physicians are increasingly being subjected to comparative performance assessments. When single-insurer data are used to profile individual physicians’ performance, reliable measurements are uncertain because of small sample sizes.
Methods: Administrative data (2006–2008) from a Dutch insurer are used to examine variation in general practitioners’ (GPs) performance on expenses (5 measures), utilization of hospital care (2 measures), and clinical quality for diabetes and chronic obstructive pulmonary disease (6 measures). Unadjusted and adjusted multilevel models are used to separate total variance in between-GP and within-GP components. The components are used to calculate intraclass correlation coefficients (ICCs), reliability, and sample size requirements at common reliability thresholds.
Results: Average ICCs varied between 0.07% (hospital admissions) and 8.34% (physiotherapy for chronic obstructive pulmonary disease patients). Risk-adjustment often greatly changed the relative size of variance components and often led to lower ICCs. In addition, ICCs and thus reliability generally decreased over time. Eight measures had reliabilities > 0.70, and 3 of these (all GP-related expenses) > 0.90. Measures related to utilization of hospital care had reliabilities < 0.60 or even 0.50. For 5 measures, the vast majority of GPs had sufficient patients to reach 0.70 reliability. At a reliability of 0.90, however, there were no measures for which all GPs met the sample size requirements.
Conclusions: Reliable measurement of individual physicians’ performance using single-purchaser data is challenging. For most measures reliability was insufficient to allow for high-stakes applications or even any application of profiling. Future research should continue to explore methods for enhancing the reliability of individual physicians’ profiles.