To examine the reliability and attributable facets of variance within an entrustment-derived workplace-based assessment system.
Faculty at the University of Cincinnati Medical Center internal medicine residency program (a 3-year program) assessed residents using discrete workplace-based skills called observable practice activities (OPAs) rated on an entrustment scale. Ratings from July 2012 to December 2016 were analyzed using applications of generalizability theory (G-theory) and decision study framework. Given the limitations of G-theory applications with entrustment ratings (the assumption that mean ratings are stable over time), a series of time-specific G-theory analyses and an overall longitudinal G-theory analysis were conducted to detail the reliability of ratings and sources of variance.
During the study period, 166,686 OPA entrustment ratings were given by 395 faculty members to 253 different residents. Raters were the largest identified source of variance in both the time-specific and overall longitudinal G-theory analyses (37% and 23%, respectively). Residents were the second largest identified source of variation in the time-specific G-theory analyses (19%). Reliability was approximately 0.40 for a typical month of assessment (27 different OPAs, 2 raters, and 1–2 rotations) and 0.63 for the full sequence of ratings over 36 months. A decision study showed doubling the number of raters and assessments each month could improve the reliability over 36 months to 0.76.
Ratings from the full 36 months of the examined program of assessment showed fair reliability. Increasing the number of raters and assessments per month could improve reliability, highlighting the need for multiple observations by multiple faculty raters.