Secondary Logo

Journal Logo

The American Board of Anesthesiology’s Standardized Oral Examination for Initial Board Certification

Sun, Huaping PhD*; Warner, David O. MD; Patterson, Andrew J. MD, PhD; Harman, Ann E. PhD*; Rathmell, James P. MD§; Keegan, Mark T. MB, BCh; Dainer, Rupa J. MD; McLoughlin, Thomas M. Jr MD; Fahy, Brenda G. MD#; Macario, Alex MD, MBA**

doi: 10.1213/ANE.0000000000004263
Medical Education
Free

The American Board of Anesthesiology (ABA) has been administering an oral examination as part of its initial certification process since 1939. Among the 24 member boards of the American Board of Medical Specialties, 13 other boards also require passing an oral examination for physicians to become certified in their specialties. However, the methods used to develop, administer, and score these examinations have not been published. The purpose of this report is to describe the history and evolution of the anesthesiology Standardized Oral Examination, its current examination development and administration, the psychometric model and scoring, physician examiner training and auditing, and validity evidence. The many-facet Rasch model is the analytic method used to convert examiner ratings into scaled scores for candidates and takes into account how difficult grader examiners are and the difficulty of the examination tasks. Validity evidence of the oral examination includes that it measures aspects of clinical performance not accounted for by written certifying examinations, and that passing the oral examination is associated with a decreased risk of subsequent license actions against the anesthesiologist. Explaining the details of the Standardized Oral Examination provides transparency about this component of initial certification in anesthesiology.

From the *American Board of Anesthesiology, Raleigh, North Carolina

Department of Anesthesiology and Perioperative Medicine, Mayo Clinic, Rochester, Minnesota

Department of Anesthesiology, Emory University, Atlanta, Georgia

§Department of Anesthesiology, Perioperative and Pain Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts

Pediatric Specialists of Virginia, Fairfax, Virginia

Lehigh Valley Health Network, Allentown, Pennsylvania

#Department of Anesthesiology, University of Florida, Gainesville, Florida

**Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University, Stanford, California.

Published ahead of print 5 February 2019.

Accepted for publication May 2, 2019.

Funding: Institutional and/or departmental.

Conflicts of Interest: See Disclosures at the end of the article.

Supplemental digital content is available for this article. Direct URL citations appear in the printed text and are provided in the HTML and PDF versions of this article on the journal’s website.

Listen to this Article of the Month podcast and more from OpenAnesthesia.org® by visiting http://journals.lww.com/anesthesia-analgesia/pages/default.aspx.

Reprints will not be available from the authors.

Address correspondence to Huaping Sun, PhD, The American Board of Anesthesiology, 4208 Six Forks Rd, Suite 1500, Raleigh, NC 27609. Address e-mail to huaping.sun@theABA.org.

See Editorial, p 1197

All 24 member boards of the American Board of Medical Specialties (ABMS; Chicago, IL) require passing a written examination in their initial certification process. Fourteen of these boards, including the American Board of Anesthesiology (ABA; Raleigh, NC), also require passing an oral examination for physicians to become certified in their specialty. Although oral examinations come with a higher cost, logistical challenges, and potential risk of subjectivity and bias, they have the potential for assessing higher competencies such as the application of knowledge. In the framework of Miller’s pyramid,1 oral examinations bring the assessment of clinical competence to “knows how,” one level higher than “knows” as assessed by written examinations. The oral examination administered by the ABA aims to assess an anesthesiologist’s ability to properly apply knowledge to patient management and analyze clinical situations, clinical judgment to define priorities in the patient’s care and make sound decisions, adaptability to unexpected changes in clinical situations, and communication skills such as the capacity to logically organize and discuss clinical information.2

Despite the use of oral examinations by some ABMS member boards, there are no published descriptions of how these examinations are developed, administered, and scored. Such descriptions are one means to provide evidence supporting the reliability and validity of an examination. The purpose of this report is to describe the history and evolution of the anesthesiology Standardized Oral Examination, its current examination development and administration, the psychometric model and scoring, physician examiner training and auditing, and validity evidence.

Back to Top | Article Outline

EXAMINATION HISTORY

The ABA administered its first oral examination in 1939. During the first few years of the examination, the number of examination sessions each candidate had to complete varied (from 1 to 9), as did the number of examiners per session (1, 2, or 3) and the duration of each session (10 or 20 minutes). For example, in 1941, the examination consisted of 1 session, 3 examiners, and 10 minutes per examiner. In 1942, the examination had 9 sessions for each candidate, 1 examiner per session, and 20 minutes per session. In 1949, the structure was changed to 3 sessions for each candidate, 20–30 minutes per session, and 2 examiners per session. A format resembling the current structure was established in 1961, when the examination was redesigned to include two 30-minute examination sessions rather than 3, with 2 examiners per session. The duration of each session was later extended to its current 35 minutes (Table). Analysis of the ABA oral examination data in 1962 found a reliability coefficient of 0.89 for the 3-session examination.3 In 1969, a study commissioned by the ABA from the National Board of Medical Examiners provided evidence that the 2-session examination was an acceptable alternative, with a reliability coefficient of 0.83 for 3 administrations and 0.84 for 1 administration.4

Table.

Table.

Until the 1950s, every candidate was examined by at least 1 of the 12 ABA physician directors, typically paired with 1 nondirector examiner. Eventually, the increasing number of candidates made it no longer possible for an ABA director to be involved in every candidate’s examination. During this time, the ABA provided the examiners with guidelines and assigned topics. After those topics were addressed, the examiners could question candidates on any additional subject. In the late 1960s, the ABA began providing examiners with specific clinical cases to be discussed and questions involving preoperative, intraoperative, and postoperative management. Senior examiners continued to be free to choose topics for questioning during the final 10 minutes of each examination session until 1978. Thereafter, “suggested” topics were provided, and in 1990, more specific “additional” topics were provided to the examiners by the ABA.

With regard to the history of grading the examination, before 1947, a group discussion among the directors and examiners determined the candidate’s result as “pass,” “fail,” or “conditional,” with those receiving “conditional” examined again at the end of the day. A “percent correct” score (0%–100%) was first introduced in 1947. An average of at least 75% was required to pass, with the directors’ grade weighted more heavily than that of nondirector examiners. From 1955 to 1956, the percent grade could only be given as a multiple of 5, and in 1957, it was further restricted to values of 65, 70, 75, or 80. Group discussion for grading was discontinued in 1956. Henceforth, individual examiners were required to grade independently. However, there was still an end-of-day discussion of the borderline candidates, defined as those who scored 74%–76% on average. A 4-point grading system (70, 73, 77, and 80) was introduced in 1961, with each of the 4 examiners weighted equally. In 1962, the passing standard was adjusted to be above 75.0% on average. In 2002, this 4-point overall grading system was replaced by the current rating scale, in which each examiner is asked to rate whether a candidate “consistently,” “often,” “occasionally,” or “rarely” demonstrates the characteristics expected of an ABA diplomate in each of the examination modules (Supplemental Digital Content, Figure 1, http://links.lww.com/AA/C845).

The most recent evolution in the standardization of the oral examination occurred in 2015, when its delivery changed from hotel rooms in locations across the country to a dedicated assessment center in Raleigh, NC. Moving from biannual administrations of the examination in hotels to 9 administrations a year in the assessment center yields several benefits. First, the physical environment of the assessment center is professional and standardized, both within and across examination administration years. Second, with the previous hotel model, half of the approximately 2000 candidates per year were examined in the spring and the other half in the fall. The dedicated assessment center allows more candidates to take their examinations earlier in the year (ie, 7 administrations from March to June and 2 administrations from September to October), which has been a strong preference expressed by candidates. Finally, the technology infrastructure built in the assessment center allows computer display of medically relevant images and the examiners to use electronic tablets on a secure wireless network instead of paper to administer and score the examination.

Back to Top | Article Outline

CURRENT EXAMINATION DEVELOPMENT

The ABA Standardized Oral Examination may include questions related to adult, pediatric, neuro-, obstetric, cardiothoracic, pain, and/or regional anesthesia. Each examination consists of two 35-minute sessions with 3 modules in each session. The first 2 modules (25 minutes total) are covered by a stem—a case vignette. The stem in the first session presents intraoperative and postoperative issues, and the stem in the second session presents preoperative and intraoperative issues. The third module is covered by a set of 3 additional topics (10 minutes total) designed to increase the breadth of topics covered within each session. Each of the additional topics presents a separate brief vignette, covering preoperative optimization, appropriate options for managing potential intraoperative events and complications, or strategies for dealing with emergency situations (Table). The 2 stems and 6 additional topics in the two 35-minute sessions are designed to assess different issues without overlap. Sample Standardized Oral Examination questions are available on the ABA website.5

The Standardized Oral Examination Committee, a group of 35 ABA board-certified anesthesiologists including ABA directors, former directors, and examiners with many years of experience, develops the examination. The work of creating new examination sets (ie, 1 stem and 3 additional topics) and reviewing already assembled examination forms occurs concurrently throughout the year.

The general guidelines given to question authors are that clinical vignettes should be cases that reflect common and current practice, constructed so they flow well from topic to topic, and should assess management and attributes expected of a consultant anesthesiologist. Committee members are paired into dyads based on their practice focus and experience to write new examinations, and topic areas are assigned from a blueprint. Planning for new examination questions begins at 1 of the 2 committee meetings held at the ABA office each spring and then collaborative development work continues remotely. Each member of the dyad is responsible for writing a clinical scenario with associated guided questions and 3 additional topics for 1 examination session, as well as editing their partner’s work for the other examination session. Examination questions are further vetted by other committee members at the onsite committee meeting the following year.

Once assembled, each examination form is reviewed by 2 senior editors on the committee and a director or former director to look for any areas that need improvement such as unintended duplication of topics or poor scenario flow. The finalized examination forms are copyedited by an ABA staff editor and published to servers for distribution to examiners during the examination week.

Back to Top | Article Outline

CURRENT EXAMINATION ADMINISTRATION

Candidates are eligible to take the Standardized Oral Examination after successfully completing residency training and passing 2 computer-based written examinations (ie, the BASIC and ADVANCED examinations). Approximately 2000 candidates are examined at the assessment center in Raleigh, NC, each year. There are 9 examination weeks annually, with 4 examination days each week, 4 examination periods each day, and 14 candidates each period. An examination consists of two 35-minute sessions with a 10-minute break in between, with 3 modules in each session.

A candidate meets a total of 4 examiners—2 examiners in each session, with 1 examiner asking questions in any given module, and responsibility for modules alternating between examiners (Table). The 2 examiners in the same session evaluate the candidate independently in real time using an electronic tablet displaying the case stem, guided questions, additional topics, and grading. To provide feedback to candidates who do not pass, when examiners assign a candidate a module rating of “occasionally” or “rarely,” they also indicate at least 1 “major” or “minor” deficiency among 4 attributes: application, adaptability, judgment, and organization/presentation. Examiners are also asked to evaluate the content and difficulty of the examination scenarios (Supplemental Digital Content, Figure 1, http://links.lww.com/AA/C845).

After each examination administration week, a candidate experience survey is emailed to all examination takers so that they can provide comments and feedback about their experience to the ABA, which are reviewed by the ABA Assessments Committee to further improve the examination.

Back to Top | Article Outline

EXAMINER TRAINING AND AUDITING

The ABA maintains a pool of approximately 400 volunteer oral examiners, who are ABA board-certified anesthesiologists participating in the Maintenance of Certification in Anesthesiology program (Raleigh, NC). Every 2 years, the ABA calls for new examiner nominations. Once appointed, examiners typically commit to 1 week of examination administration (ie, Sunday afternoon to Thursday) each year and move through a graduated examiner classification system from New Examiner (years 1–2) to Associate Examiner (years 3–6) to Full Examiner (years 7+).

ABA oral examiners are expected to pose questions and respond to candidates’ responses in a professional manner, probe for breadth and depth of knowledge without perseverating on topics for which the candidate demonstrates knowledge gaps, manage transitions between topics smoothly, and maintain a logical flow and a reasonable tempo. Examiners are also instructed not to ask leading questions and not to provide candidates with feedback that a response is either correct or incorrect. The goal of the training and auditing is to ensure standardization of the examination such that all candidates have an equal opportunity to demonstrate their abilities and attributes. Regardless of their length of service, all examiners must participate in the ABA examiner briefing session held on the Sunday afternoon before every examination week. This session reemphasizes the purpose of the Standardized Oral Examination as a component of initial certification and the protocols for properly administering the examination and rating the candidates.

New examiners receive extensive training and mentoring during their first 2 years of service, to set a strong foundation for skill development as oral examiners. On the first morning of their examinations, new examiners participate in a New Examiner Workshop led by an ABA director or former director. The first 45-minute presentation provides an in-depth understanding of the purpose and the structure of the oral examination, followed by group discussions of the specific protocols for administering the examination and applying the rating scales. Then the new examiners observe and independently rate the live examinations for at least 2 examination periods using a live video feed. Between examination sessions, new examiners discuss what ratings they assigned and what elements of the candidates’ performance led them to assign those ratings. During an examination week, new examiners are paired with experienced full examiners who have been trained as mentors. These mentors provide individualized feedback, coaching, and remediation guidance immediately after each examination session. They also review and critique the new examiners’ previous examination administrations using recorded videos.

Examination audits are completed by ABA directors or experienced full examiners, either when paired with the examiner being audited during an examination or by observing examinations via videos. Audit criteria include whether examiners cover all examination content and the depth of such coverage, and the examiners’ communication skills, adaptability, and demeanor. All examiners are audited periodically, and every session conducted by new examiners is audited. Examiners whose performance is suboptimal are counseled as to how to improve. Since 2015, 7% of new examiners failed to advance in the examiner system because their performance was judged as below the expected standard.

Back to Top | Article Outline

PSYCHOMETRIC MODEL

In each of the two 35-minute examination sessions of the Standardized Oral Examination, there are 3 modules: 2 evaluating a main vignette and 1 evaluating additional topics. With 2 examiners per session, there are a total of 12 module ratings per candidate on 4 different tasks: preoperative evaluation (2 ratings), intraoperative management (4 ratings), postoperative care (2 ratings), and additional topics (4 ratings; Table).

The examination scoring is analyzed using the many-facet Rasch model,6 which calculates measures of candidate ability, examiner rating severity, and task difficulty using the module ratings. Candidates’ ability is based on all the ratings they receive, and examiners’ rating severity is based on all the ratings they assign to the candidates. Difficulty of a task is based on all the ratings assigned to it by all the examiners. Each of these 3 “facets” influences the ratings that examiners give to candidates on each task. To ensure the equitable measurement of candidates, differences among examiners and tasks are accounted for in the scoring process. The specific topics on each task are assumed to be of comparable difficulty.

For the purpose of scoring, the module ratings of “consistently,” “often,” “occasionally,” and “rarely” are converted to 4, 3, 2, and 1, respectively. This is the inverse of the ratings examiners record, in which 1 = “consistently” and 4 = “rarely” (Supplemental Digital Content, Figure 1, http://links.lww.com/AA/C845). The unit of measurement of the Rasch model is a logit. According to the many-facet Rasch model, when a candidate is rated the natural log of the odds of a candidate being rated in category k can be conceptually modeled as follows:

where Pnmi(k) = probability of candidate n being rated in category k by examiner m on task i,

Pnmi(k1) = probability of candidate n being rated in category (k− 1) by examiner m on task i,

Bn = the ability of candidate n,

Sm = the rating severity of examiner m,

Ci = the difficulty of task i, and

Fk = the difficulty of the step up from category (k− 1) to category k.

The probability of a candidate earning a rating of k is a function of the difference between the ability of the candidate (Bn), the rating severity of the examiner (Sm), the difficulty of the task (Ci), and the difficulty of the step up from the next lower category (k− 1). This conceptual model along with maximum likelihood estimation is used to obtain the estimates of these 3 facets that have most likely produced the observed ratings. A higher logit measure represents a more capable candidate, a more severe examiner, or a more difficult task.

With the use of the logistic function, ordinal raw ratings are transformed to an interval scale of logit measures. Calibrating all facets onto the same scale makes it possible to present candidates in order of ability, examiners in order of how difficult a grader they are, and tasks in order of difficulty. The ordering of these elements on the same linear scale provides a frame of reference to understand the relationship between the facets (Figure 1). In the example shown in Figure 1, both the average logit measure of examiner rating severity and task difficulty are 0, and the average logit measure of candidate ability is 2.5. This indicates that overall the candidates are more “capable” than how severe the examiners are in scoring candidates and how difficult the tasks are.

Figure 1.

Figure 1.

During the past several years, the reliability of the oral examination has consistently been greater than 0.80. This Rasch-based reliability is conceptually comparable to Cronbach alpha based on binary raw scores (correct or incorrect) for single-best-answer multiple-choice examinations. It is a ratio of “true variance” to “observed variance” of the candidate performance. In other words, it is a quantifiable measure of how reproducible the rank ordering of the candidate performance is (ie, to what extent candidates estimated with higher measures actually do have higher ability than those estimated with lower measures).7

Figure 2.

Figure 2.

The standard for passing the Standardized Oral Examination is set based on the ABA expectation that minimally competent candidates would, on average, “often” demonstrate qualities expected of an ABA diplomate across all tasks and examiners. The many-facet Rasch model calculates the logit measure that corresponds to the fair average of “often” on the rating scale, which adjusts for examiner rating severity and task difficulty. To reduce the possibility of “false negatives,” the standard is also adjusted down by half of the standard error of measurement. This adjustment gives the “benefit of the doubt” to candidates. Because this model monitors examiner rating severity and task difficulty, the equating process allows the same criterion-referenced standard to be applied to different administrations of the examination. The many-facet Rasch model was first used for the ABA oral examination in 2002, and the most recent standard was set in 2012. First-time pass rates since 1984 are shown in Figure 2.

Back to Top | Article Outline

EVIDENCE OF VALIDITY

The rationale for administering both written and oral examinations is that they measure different constructs important to clinical practice. With the caveat that candidates must pass the written certification examination first to be eligible for the oral examination, the Pearson correlation between written and oral examination scaled scores (both obtained at the first administration if >1 attempt) equaled 0.31 (95% confidence interval [CI], 0.29–0.33; P < .001; n = 7650) from 2012 to 2017 (Figure 3). This modest correlation suggests the written and oral examinations likely measure somewhat different constructs thereby providing rationale for administering both.

Figure 3.

Figure 3.

Several studies support the external validity of the ABA oral examination in terms of assessment of clinical abilities. For example, program directors’ judgments of new residency graduates’ clinical skills, which was assessed as the comfort level of having the graduates performing 3 increasingly complex anesthetics, were predictive of the graduates’ probability of passing both the written and oral examinations.8 A second study of 111 graduates from a single anesthesiology residency program found that their clinical performance as evaluated on a weekly basis by their faculty during the final year of residency correlated with both written (r = 0.27 [95% CI, 0.09–0.43]; P = .0047) and oral examination scores (r = 0.33 [95% CI, 0.15–0.48]; P = .0005).9 Both written examination scores and clinical performance scores were independently associated with oral examination scores, suggesting that the oral examination measures important aspects of clinical performance not accounted for by the written examination. A third study evaluated the effectiveness of the ABA written and oral examinations in predicting the risk of a disciplinary action against an anesthesiologist’s medical license by the state medical licensure board.10 Compared with those passing both examinations on the first attempt, those passing neither examination (hazard ratio, 3.60 [95% CI, 3.14–4.13]) and those passing only the written examination (hazard ratio, 3.51 [95% CI, 2.87–4.29]) had an increased risk of a disciplinary action. The risk was not different between the latter 2 groups, suggesting that passing the oral, but not the written examination, was associated with a decreased risk of subsequent license actions. These findings support the concept that the oral examination assesses domains important to physician performance not fully captured in the written examination.

Back to Top | Article Outline

SUMMARY

This report described the development, administration, and scoring process, as well as examiner training and auditing of the ABA Standardized Oral Examination. Validity evidence suggests that the oral examination is complementary to the written certifying examinations, thereby adding value to the initial certification process. According to the framework of Miller’s pyramid, the next higher level of clinical competency after “knows how” as assessed by the Standardized Oral Examination is “shows how.”1 To assess “shows how” in structured situations, the ABA launched the Objective Structured Clinical Examination in 2018, and will be studying the impact of adding it to initial board certification.

Back to Top | Article Outline

DISCLOSURES

Name: Huaping Sun, PhD.

Contribution: This author helped conceptualize the manuscript; manage, analyze, and interpret the data; and draft the manuscript.

Conflicts of Interest: H. Sun is a staff member of the American Board of Anesthesiology.

Name: David O. Warner, MD.

Contribution: This author helped conceptualize the manuscript; manage, analyze, and interpret the data; and draft the manuscript.

Conflicts of Interest: D. O. Warner serves as a director for the American Board of Anesthesiology.

Name: Andrew J. Patterson, MD, PhD.

Contribution: This author helped conceptualize the manuscript, interpret the data, and draft the manuscript.

Conflicts of Interest: A. J. Patterson serves as a director for the American Board of Anesthesiology.

Name: Ann E. Harman, PhD.

Contribution: This author helped conceptualize the manuscript, interpret the data, and draft the manuscript.

Conflicts of Interest: A. E. Harman is a staff member of the American Board of Anesthesiology.

Name: James P. Rathmell, MD.

Contribution: This author helped conceptualize the manuscript, interpret the data, and draft the manuscript.

Conflicts of Interest: J. P. Rathmell serves as a director for the American Board of Anesthesiology.

Name: Mark T. Keegan, MB, BCh.

Contribution: This author helped conceptualize the manuscript, interpret the data, and draft the manuscript.

Conflicts of Interest: M. T. Keegan serves as a director for the American Board of Anesthesiology.

Name: Rupa J. Dainer, MD.

Contribution: This author helped conceptualize the manuscript, interpret the data, and draft the manuscript.

Conflicts of Interest: R. J. Dainer serves as a director for the American Board of Anesthesiology.

Name: Thomas M. McLoughlin Jr, MD.

Contribution: This author helped conceptualize the manuscript, interpret the data, and draft the manuscript.

Conflicts of Interest: T. M. McLoughlin Jr serves as a director for the American Board of Anesthesiology.

Name: Brenda G. Fahy, MD.

Contribution: This author helped conceptualize the manuscript, interpret the data, and draft the manuscript.

Conflicts of Interest: B. G. Fahy serves as a director for the American Board of Anesthesiology.

Name: Alex Macario, MD, MBA.

Contribution: This author helped conceptualize the manuscript; manage, analyze and interpret the data; and draft the manuscript.

Conflicts of Interest: A. Macario serves as a director for the American Board of Anesthesiology.

This manuscript was handled by: Edward C. Nemergut, MD.

Back to Top | Article Outline

REFERENCES

1. Miller GE. The assessment of clinical skills/competence/performance. Acad Med. 1990;65:S63–S67.
2. The American Board of Anesthesiology. The American Board of Anesthesiology APPLIED Examination. Available at: http://www.theaba.org/Exams/APPLIED-(Staged-Exam)/About-APPLIED-(Staged-Exam). Accessed August 6, 2018.
3. Carter HD. How reliable are good oral examinations. Calif J Educ Res. 1962;13:147–153.
4. Kelley PR Jr, Matthews JH, Schumacher CF. Analysis of the oral examination of the American Board of Anesthesiology. J Med Educ. 1971;46:982–988.
5. The American Board of Anesthesiology. The American Board of Anesthesiology Standardized Oral Examination Sample Questions. Available at: http://www.theaba.org/PDFs/APPLIED-Exam/SOE-Sample-Questions. Accessed August 6, 2018.
6. Linacre JM. Many-Facet Rasch Measurement1994.2nd ed. Chicago, ILMESA Press.
7. Linacre JM. Reliability and separation monograms. In: Rasch Measurement Transactions. 1995:Vol 9. 421. Available at: https://www.rasch.org/rmt/rmt92a.htm. Accessed May 30, 2019.
8. Slogoff S, Hughes FP, Hug CC Jr, Longnecker DE, Saidman LJ. A demonstration of validity for certification by the American Board of Anesthesiology. Acad Med. 1994;69:740–746.
9. Baker K, Sun H, Harman A, Poon KT, Rathmell JP. Clinical performance scores are independently associated with the American Board of Anesthesiology Certification Examination Scores. Anesth Analg. 2016;122:1992–1999.
10. Zhou Y, Sun H, Culley DJ, Young A, Harman AE, Warner DO. Effectiveness of written and oral specialty certification examinations to predict actions against the medical licenses of anesthesiologists. Anesthesiology. 2017;126:1171–1179.

Supplemental Digital Content

Back to Top | Article Outline
Copyright © 2019 International Anesthesia Research Society