Secondary Logo

Journal Logo

The American Board of Anesthesiology’s Staged Examination System and Performance on the Written Certification Examination After Residency

Zhou, Yan PhD*; Sun, Huaping PhD*; Macario, Alex MD, MBA; Martin, Donald E. MD; Rathmell, James P. MD§; Warner, David O. MD

doi: 10.1213/ANE.0000000000004250
Medical Education
Free

This study compared anesthesiology residency graduates’ written certification examination performance before and after the American Board of Anesthesiology (ABA) introduced the staged examination system. After equating test scores using common test items, the first 2 cohorts (2013, 2014) in the staged system scored 7.1 points and 8.3 points higher than the 2011 baseline cohort in the former examination system. The 2013 and 2014 cohorts’ pass rates (94.2% and 95.9%) were also higher than the 2011 and 2012 cohorts (91.9% and 92.6%) if a common standard had been applied. The staged examination system may be associated with improved knowledge of anesthesiology graduates.

From the *American Board of Anesthesiology, Raleigh, North Carolina

Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University, Stanford, California

Department of Anesthesiology and Perioperative Medicine, Penn State College of Medicine, Hershey, Pennsylvania

§Department of Anesthesiology, Perioperative and Pain Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts

Department of Anesthesiology and Perioperative Medicine, Mayo Clinic, Rochester, Minnesota.

Published ahead of print 24 April 2019.

Accepted for publication April 24, 2019.

Funding: Institutional and/or departmental.

Conflicts of Interest: See Disclosures at the end of the article.

Listen to this Article of the Month podcast and more from OpenAnesthesia.org® by visiting http://journals.lww.com/anesthesia-analgesia/pages/default.aspx.

Reprints will not be available from the authors.

Address correspondence to Huaping Sun, PhD, The American Board of Anesthesiology, 4208 Six Forks Rd, Suite 1500, Raleigh, NC 27609. Address e-mail to huaping.sun@theaba.org.

See Editorial, p 1197

Back to Top | Article Outline

GLOSSARY

ABA = American Board of Anesthesiology; CA-1 = clinical anesthesia year 1; CI = confidence interval; ITE = In-Training Examination

The American Board of Anesthesiology (ABA; Raleigh, NC) has transitioned to a staged examination system for initial board certification. In the former (“traditional”) system, physicians were required to pass a written examination (Part 1) and an oral examination (Part 2) after completing a 4-year anesthesiology residency.

In the new “staged” examination system, there are now 2 written examinations: the BASIC Examination (first offered in 2014), typically taken at the end of clinical anesthesia year 1 (CA-1; the second year of residency), and the ADVANCED Examination (first offered in 2016), taken after finishing residency. Residents must pass the BASIC Examination before completing residency and then the ADVANCED Examination before having to pass the APPLIED Examination to achieve initial board certification.1

The goal of introducing the BASIC Examination in the staged examination system was for residents to improve knowledge acquisition during training. The ABA offers an annual In-Training Examination (ITE) to assess a resident’s progress toward the mastery of knowledge eventually assessed in the ADVANCED Examination. A prior study suggested that the staged examination system accelerated the improvement of ITE performance during residency.2 However, the method used to score and equate the ITE in those years did not allow evaluation of whether adding the BASIC Examination had improved resident knowledge once residents finished training as measured by the written certification examination.

The ABA determines the passing standard on written certification examinations through a standard-setting study performed by an expert group using the Hofstee Method,3 a combination of criterion-referenced and norm-referenced approaches. Once a standard is set, test forms (the actual set of test questions delivered to candidates) used in subsequent administrations are equated analytically to the test form used to set the standard using test items (questions) common to both forms, referred to as “linking items.” When the Part 1 Examination in the traditional system was replaced by the ADVANCED Examination in the staged system in 2016, linking items were not explicitly included as the ADVANCED Examination was a new examination and therefore required its own standard-setting study. Nevertheless, the 2016 and 2017 ADVANCED Examinations did include some items previously used in the Part 1 Examinations, making it possible to compare examination performance before and after 2016.

The purpose of this study was to compare performance on the written certification examinations taken by anesthesiology residency graduates before and after the ABA introduced the staged examination system by equating test scores using common test items.

Back to Top | Article Outline

METHODS

This study was deemed exempt from review by the Mayo Clinic Institutional Review Board (Rochester, MN).

Back to Top | Article Outline

Study Population

Figure.

Figure.

This study included the entering classes of CA-1 residents in 2011–2014 who graduated by September 30 of their expected graduation year in 3 years (n = 5270). The 2 cohorts starting in 2011 and 2012 were in the traditional system and took the Part 1 Examination after training in 2014 and 2015, respectively. The 2 cohorts starting in 2013 and 2014 were in the staged system and took the ADVANCED Examination in the summer of 2016 and 2017, respectively, after passing the BASIC Examination (Figure). Although the ADVANCED Examination is also offered in the winter, very few candidates took the winter examination, so only those taking the summer examination were included for comparability.

Back to Top | Article Outline

Score Equating

The ABA uses the Rasch model,4 a particular case of the item response theory models, for the operational scoring of its written certification examinations. This model estimates a person ability parameter for each examinee and an item difficulty parameter for each single-best-answer multiple choice question, with logit as the unit of measurement. A logit is a log-odds unit, mathematically defined as for the dichotomously scored items (eg, a response is either correct or incorrect). In the Rasch model, the logit of a correct response by a person to an item is equal to the difference between the person’s ability estimate and the item’s difficulty estimate. Once a passing standard is set, a score scale is established based on the calibration group candidates (American medical school graduates taking the examination for the first time under standard conditions) with a mean of 250 and a standard deviation of 50. All item difficulty parameters estimated from this group are saved as an item bank. Scores for examinations administered in subsequent years are equated to the same scale using a common-item equating procedure,5,6 so that scaled scores from different administrations can be compared. This procedure starts with fixing the item difficulty parameters for linking items using values from the item bank, referred to as “anchor values.” After the first run of the model with the calibration group only, each linking item is inspected for displacement measure, a misfit index approximating the displacement of the estimate away from a value estimated from the current data.7 When the displacement measure exceeds 0.6 logits (<−0.6 or >+0.6),8 the item is unanchored and the model is rerun, allowing the item parameter to be estimated from the current data. This is done iteratively until all anchored items meet the fit criterion, and then all item parameters from this model are applied to the entire group of candidates to obtain person estimates.

The ABA written certification examinations typically include 50 linking items for a 200-item test form. Linking items generally have demonstrated desirable measurement qualities in previous administrations in terms of moderate difficulty and adequate item discrimination, and their distribution in each content area is proportional to that of the overall examination (a “mini-test”). Starting with 50 linking items, the equating process usually ends up with at least 40 anchored items. In educational testing, a suggested rule of thumb is that the set of common items used for equating should be at least 20% of the length of a total test containing ≥40 items; when the test is long, 30 common items may suffice.5

For the traditional examination system, the passing standard was set in 2011. Scores of the 2014 and 2015 Part 1 Examinations were equated to the 2011 Part 1 Examination score scale via linking items built in the test forms. With the introduction of the ADVANCED Examination in 2016, a new passing standard was set, which was also applied to the 2017 ADVANCED Examination. Thus, scores of the 2016 and 2017 ADVANCED Examinations were reported on the same scale, which was different from the scale used operationally for the 2014 and 2015 Part 1 Examinations.

To make the scaled scores comparable across all 4 cohorts for the purpose of this study, the common-item equating procedure was performed to hypothetically put the 2014 and 2015 Part 1 Examinations on the same scale as the 2016 and 2017 ADVANCED Examinations. The procedure started with anchoring all common items available between the 2014 or 2015 Part 1 Examination form and the new item bank established since 2016, including items from both the 2016 and 2017 ADVANCED Examinations. In the end, 76 common items (out of 223 scored items) were used to equate the 2014 Part 1 Examination and 36 (out of 219 scored items) were used to equate the 2015 Part 1 Examination.

Back to Top | Article Outline

Statistical Analysis

Equated scaled scores on the Part 1/ADVANCED Examinations were compared among the 4 cohorts using multiple linear regression, adjusting for the effects of gender and medical school country (the US versus other countries) as these 2 variables are known to be associated with written examination performance.2,9,10 Two candidates in the 2011 cohorts were excluded from this analysis due to a history of disciplinary actions against medical licenses, which is known to be associated with examination performance.2,9 A P value <0.05 was considered to indicate statistical significance. Statistical analyses were performed in R version 3.3.1 (R Foundation for Statistical Computing, Vienna, Austria).

Back to Top | Article Outline

RESULTS

The proportion of residents who took the Part 1/ADVANCED Examination in the summer of their graduation year varied from 95.4% to 98.6% for the 4 cohorts (Table). The 2012 cohort had similar performance on the Part 1 Examination as the 2011 cohort (95% confidence interval [CI] of adjusted score difference, −5.3 to 1.9; Table).

Table.

Table.

The 2013 cohort (residents beginning CA-1 training in 2013), the first cohort participating in the staged system, had significantly higher examination scores (by 7.1 points [95% CI, 3.5–10.8]) compared to the 2011 cohort. The 2014 cohort, the second cohort participating in the staged system, also had significantly higher scores (by 8.3 points [95% CI, 4.6–12.0]) compared to the 2011 cohort (Table). An 8-point difference on the score scale translates to approximately 2 more questions being answered correctly.

Operationally, the passing score and the proportion of examinees who passed (ie, the pass rates) were determined by separate standard-setting studies for the Part 1 and ADVANCED Examinations. The actual pass rates for the 2013 and 2014 cohorts (ADVANCED Examination; 94.2% and 95.9%, respectively) were higher than the pass rates for the 2011 and 2012 cohorts (Part 1 Examination; 90.7% and 91.7%, respectively). Because all examinations were now equated to a common score scale, it became possible to calculate the pass rates as if a common standard had been applied. When this was done using the standard set for the ADVANCED Examination in 2016, the pass rates for the 2011 and 2012 cohorts increased to 91.9% and 92.6%, respectively, suggesting that slightly more individuals in these 2 cohorts would have passed their written examination if the standard set in 2016 had been applied.

Using this common standard, the pass rates for the 2013 and 2014 cohorts were still higher than those for the 2011 and 2012 cohorts, suggesting that the increase in pass rate accompanying the introduction of the staged examination system could not be solely accounted for by a change in the standard, but may rather be associated with improved knowledge of candidates.

Back to Top | Article Outline

DISCUSSION

The introduction of the ABA staged examination system was associated with modestly higher scores on written certification examinations administered at the end of residency, as well as higher pass rates. Adding the BASIC Examination may have contributed to better performance of graduates on written examinations due to residents studying more and earlier during training, residency programs adjusting their curricula, and earlier remediation of low-performing residents.2 This analysis has several limitations. First, because the common items between the Part 1 and the ADVANCED Examinations used for equating were not deliberately planned, they may not sufficiently demonstrate content representativeness. Second, it is possible that the ability of resident cohorts entering anesthesia training changed during the study years, although a prior analysis of ITE performance at the beginning of training suggests that this is unlikely.2 Finally, bias could be introduced by year-to-year differences in the proportion of residents taking the examination immediately after graduation, changes in the examination content outline listing topics to be covered in the examination, or other unmeasured confounding factors. Further study is needed to determine if the increase in the written certification examination scores and pass rates accompanying the introduction of the staged examination system is associated with the desired outcome of improved physician performance in clinical practice.

Back to Top | Article Outline

DISCLOSURES

Name: Yan Zhou, PhD.

Contribution: This author helped design the study; manage, analyze, and interpret the data; and draft the manuscript.

Conflicts of Interest: Y. Zhou is a staff member of the American Board of Anesthesiology.

Name: Huaping Sun, PhD.

Contribution: This author helped design the study; manage, analyze, and interpret the data; and draft the manuscript.

Conflicts of Interest: H. Sun is a staff member of the American Board of Anesthesiology.

Name: Alex Macario, MD, MBA.

Contribution: This author helped design the study; manage, analyze, and interpret the data; and draft the manuscript.

Conflicts of Interest: A. Macario serves as a Director for the American Board of Anesthesiology.

Name: Donald E. Martin, MD.

Contribution: This author helped design the study, interpret the data, and draft the manuscript.

Conflicts of Interest: D. E. Martin serves as the Chair of the American Board of Anesthesiology ADVANCED Examination Committee.

Name: James P. Rathmell, MD.

Contribution: This author helped design the study, interpret the data, and draft the manuscript.

Conflicts of Interest: J. P. Rathmell serves as a Director for the American Board of Anesthesiology.

Name: David O. Warner, MD.

Contribution: This author helped design the study; manage, analyze, and interpret the data; and draft the manuscript.

Conflicts of Interest: D. O. Warner serves as a Director for the American Board of Anesthesiology.

This manuscript was handled by: Edward C. Nemergut, MD.

Back to Top | Article Outline

REFERENCES

1. American Board of Anesthesiology. APPLIED (Staged Exams). Available at: http://www.theaba.org/Exams/APPLIED-(Staged-Exam)/About-APPLIED-(Staged-Exam). Accessed March 14, 2019.
2. Zhou Y, Sun H, Lien CA, et al. Effect of the BASIC examination on knowledge acquisition during anesthesiology residency. Anesthesiology. 2018;128:813–820.
3. Zieky MJ, Perie M, Livingston SA. Cutscores: A Manual for Setting Standards of Performance on Educational and Occupational Tests. 2008:86 & New Jersey: Educational Testing Service; 153–154.
4. Rasch G. Studies in Mathematical Psychology: I. Probabilistic Models for Some Intelligence and Attainment Tests. 1960.Oxford, England: Nielsen & Lydiche.
5. Kolen MJ, Brennan RL. Test Equating, Scaling, and Linking. 2004:201–208 & 2nd ed. New York, NY: Springer Science+Business Media; 271–272.
6. Stocking ML, Lord FM. Developing a common metric in item response theory. Appl Psychol Meas. 1983;7:201–210.
7. Linacre JM. Winsteps® Rasch Measurement Computer Program User’s Guide. 2018.Beaverton, OR: Winsteps.com.
8. O’Neill T, Peabody M, Tan R, Du Y. How much item drift is too much? Rasch Measurement Transactions. 2013;27:1423–1424.
9. Sun H, Culley DJ, Lien CA, Kitchener DL, Harman AE, Warner DO. Predictors of performance on the Maintenance of Certification in Anesthesiology Program® (MOCA®) examination. J Clin Anesth. 2015;27:1–6.
10. McClintock JC, Gravlee GP. Predicting success on the certification examinations of the American Board of Anesthesiology. Anesthesiology. 2010;112:212–219.
Copyright © 2019 International Anesthesia Research Society