Credentialing boards often require practitioners to periodically demonstrate minimum competency after obtaining their initial credentials. For physicians, standards for maintenance of certification (MOC) programs are set by the American Board of Medical Specialties (ABMS) and include four parts: Part I, professionalism and professional standing; Part II, lifelong learning and self-assessment; Part III, assessment of knowledge, judgment, and skills; and Part IV, improvement in medical practice.1 The specific requirements vary by medical subspecialty board and have gradually become more rigorous since their inception, beginning with continuing certification and now including timelines and more ongoing evidence.
In response to the increasing MOC requirements, physicians from a variety of subspecialties have expressed dissatisfaction with the MOC process.2–4 One main criticism is that Part III includes a broad-based standardized pass/fail examination with questions that may be seen as irrelevant to highly specialized doctors. Thus, failing the test may not necessarily indicate incompetence if poor performance occurs in an area unrelated to that individual’s practice. However, this potential misalignment could be avoided if the test were developed with strong assessment design principles, such as evidence-centered design,5 which would begin with collecting evidence for test content on the skills needed for safe and effective clinical practice. Additionally, the time commitment and fees associated with ongoing MOC activities are perceived as unrealistic. A recent article by Sandhu et al6 estimated that, depending on the subspecialty, the cost to a physician approximately ranges from $17,000 to $40,000 in a 10-year MOC cycle.
In an effort to address this dissatisfaction while remaining in compliance with ABMS MOC guidelines, some governing boards have pursued alternative assessments. In 2014, the American Board of Anesthesiology implemented the MOCA Minute,7 which has now replaced its Part III examination. Instead of a fixed examination, the MOCA Minute allows physicians to first identify their practice areas and then complete 30 questions every three months for ongoing evaluation and feedback. Additionally, the ABMS is piloting a new Web-based assessment platform using a similar longitudinal testing approach that may replace the Part III standard.8
A commonality between these new assessment formats is a greater focus on continuing education. However, validating learning from these new assessment formats will take time, and there is little to no existing research to support their utility compared with a more traditional standardized examination. Still, these recent initiatives substantiate a serious reaction by sponsoring medical boards to improve MOC assessment.
Another criticism of MOC, and often testing in general, is that closed-book examinations are not clinically relevant as doctors have the ability to look up information in practice. If allowing references better reflects clinical practice, then an open-book testing environment may add content validity to MOC assessments. In addition to validity benefits, general research on open-book testing suggests that test takers experience less stress and anxiety,9 that long-term effects for knowledge retention are similar to closed-book tests,10 and that the immediate feedback helps promote learning while mitigating acquisition of false information in the absence of the correct answer.11
The idea of providing an open-book testing experience for an MOC examination is not new; the American Board of Dermatologists12 and the American Board of Ophthalmology13 both had open-book, unproctored, take-home MOC examinations up until 2009. However, they transitioned back to a closed-book proctored examination to be in compliance with the ABMS MOC standards as well as to satisfy stakeholders concerned about the continued competence of board-certified physicians.
Given the challenges with the current state of MOC and that research does not yet support the effectiveness of more novel assessment methods, there exists an opportunity for a compromise approach. The purpose of our study was to examine the effect of allowing test takers to access certain information while completing their MOC Part III standardized examinations. In this situation, test takers were allowed to access approved reference material during the test in a proctored, secure environment, which meets the ABMS requirements. The research question was how test takers’ performance and behavior are affected when they can access reference material. Thus, the main focus of this research was to explore the impact on test takers as they manage the ability to access references during their test day experience compared with when this feature was not available.
We obtained item response data from physicians who completed a medical subspecialty MOC Part III examination through a computer-based testing platform that captured not only item responses but also the number of seconds until a response was provided. We also collected data through an optional posttest survey that physicians completed immediately after finishing the test within the same interface. In addition to questions focused on the test center experience and structure of the test, the survey queried physicians on how they used the reference material. For example, one survey question was “What was the primary purpose for accessing references?” with the response options “did not access references,” “to inform a response,” “to check an answer already provided,” or “other, please specify.”
The test is administered to approximately 150 individuals annually and composed of 200 multiple-choice items divided into 4 sections of 50 items, with 60 minutes to complete each section. The participant pool for this study was a total of 546 individuals testing between 2013 and 2016. In 2013 and 2014, 310 physicians completed the examination in a secure testing environment without access to any reference material. In 2015 and 2016, 236 physicians also completed the examination in a secure testing environment, but with access to certain reference material. Physicians were informed, months in advance of testing, that they could use textbooks, personal notes, and other written materials during the test session. Physicians also signed an honor pledge on the test day indicating that they would not access the Internet, provide assistance, or accept assistance from others during the examination. Online references, though more reflective of clinical practice, were prohibited to prevent the potential leaking of secure test material (e.g., posting a question to an online message board). Additionally, references were not directly monitored during the test, but we included questions related to how examinees used references on the posttest survey for the 2016 administration.
We informed study participants, upon registration for the MOC examination, that their deidentified data may be used for research. An initial review conducted by the American Institutes for Research Institutional Review Board found this research to be exempt from oversight as it did not involve human subjects and the analyses were based on deidentified data.
To mitigate item exposure and cheating behavior, each year a different version of the test is administered. With different versions of the test each year, physicians may respond correctly to a different percentage of the items simply because the test form for that particular year is more or less difficult. Thus, comparing performance between years using percent correct scores could be biased. To adjust for this possible confound of differing levels of difficulty between forms, we employed Rasch item response theory modeling14 to equate between years using common items across forms. Scored item response data for all items and test takers were calibrated to produce difficulty estimates for each item and to compute physician ability estimates. We then converted these ability estimates to scale scores with a mean of 500 and a standard deviation of 100 to facilitate score comparisons between forms.
To investigate whether accessing references was related to better performance on the examination, we conducted an analysis of covariance (ANCOVA) on the scale scores with references (no access or access) as the between-groups factor and scores from the physicians’ initial certification examination as a covariate. Given that physicians were not randomly assigned to the testing condition, using their initial certification scores as a covariate helped control for potential group ability differences prior to completing the MOC examination. Descriptive analyses were also conducted to investigate how the new feature of accessing references influenced performance and time management within the test day. Moreover, we analyzed posttest survey responses to inform changes in attitudes regarding how much time is needed and how physicians accessed references. All statistical analyses were conducted with R statistical software, version 3.3.1 (R Foundation for Statistical Computing, Vienna, Austria).
Assumptions for conducting the ANCOVA were satisfied, and the interaction between testing condition and physicians’ initial certification scores was not significant (F(1, 543) = 3.65, P > .05). Thus, though physicians were not randomly assigned to the testing condition, both groups were equivalent in their medical subspecialty content knowledge at the time of initial certification. After controlling for initial certification scores, physicians scored significantly higher on the MOC examination when they were allowed to access references (mean = 534.44, standard error = 6.83) compared with when they were not (mean = 472.75, standard error = 4.87), F(1, 543) = 60.18, P < .001, ω2 = 0.09. This finding reflects a medium effect15 and provides evidence that with the ability to access references, physicians were able to correctly respond to a higher percentage of items.
For the 104 physicians who completed the 2016 posttest survey, results indicated that 95% (99/104) of respondents accessed references while testing, and 36% (37/104) accessed references for more than 20 questions. Additionally, 78% (81/104) indicated that they accessed references to check an answer already provided, as opposed to 16% (17/104) of respondents who were looking to inform a response. Interestingly, from 495 physician surveys across all years, a lower proportion of respondents felt they had sufficient time when references were allowed: 67% (153/222) compared with 96% (261/273) when they were not. Figure 1 plots total testing time by reference condition and indicates that physicians used more allotted time when they were able to access references. Additionally, physicians were significantly more likely to finish with less than one minute of allotted time remaining in each section when they were allowed access to references compared with when they were not, χ2(1) = 106.58, P < .001, odds ratio = 13.47.
Figure 2 presents median response time in seconds by item position for the physicians who tested with access to references. Items were presented randomly within each section, so item position reflects the particular item that was encountered at that point during the test. At any given item position, each physician is presented with a different item, so analyzing the data at this level eliminates item effects (e.g., difficulty, word count) and allows us to focus on patterns over time. Figure 2 shows that test takers began to respond more rapidly as they proceeded in the first section, likely because they had spent too much time at the beginning prior to adjusting to the testing environment. Time management was relatively flat for the second and fourth sections, perhaps because of better pacing and more efficient use of references. Surprisingly, pacing on the third section was more similar to the first section, likely because of physicians’ readjusting to the test after taking their authorized 30-minute break.
Although physicians took near the maximum allotted time, the ANCOVA demonstrated that physicians performed better with the ability to access references. If insufficient time had a substantive impact, then we would expect a relative decline in performance near the end of each section. Figure 3 illustrates how performance changed as physicians proceeded throughout the test with references available. The proportion correct reflects the proportion of test takers who responded correctly to the particular item that they encountered at that item position. The plot demonstrates that performance was fairly stable throughout each section, suggesting that physicians did have sufficient time even when accessing references.
Our findings indicate that physicians scored substantially higher on an MOC Part III examination when reference material was accessible. With score gains of approximately two-thirds of a standard deviation, it is possible that some physicians who failed under the more traditional testing format would have otherwise passed, though that depends on their proximity to the passing standard. Though permitting references caused an increase in MOC performance, it also caused a decrease in the perception that the test has sufficient time limits. Response time data indicated that, when references were available, physicians changed their testing behaviors and used most of the allotted time—demonstrating that the cost of failing far outweighs the inconvenience of spending more time testing. For medical subspecialty boards considering access to references on an MOC examination, these findings suggest that test takers will likely need clear guidance on how to pace themselves. For instance, it may be fruitful to advise physicians that the examination timing will not permit them to access references for all questions. Despite consistent performance throughout sections and overall improvement compared with previous cohorts, guidance appears necessary as test takers expressed frustrations regarding the total amount of time allotted. An alternative to providing guidance would be to increase the time allotted or perhaps to offer unlimited time. However, as with changing the examination type, modifications to the time allotted could significantly alter the assessed construct and may not be representative of practice. As a result, these changes have measurement implications, such as whether the passing standard should be adjusted. Of course, it is also important to determine whether increasing the allotted time, or removing time limits altogether, aligns with the intended inferences of the examination.
Future research on this topic could focus on what reference sources were more popular to access and the extent to which they led to a correct answer. Given that Internet access was prohibited in order to maintain a secure test environment, it would be interesting to explore the efficacy of accessing written materials compared with online, searchable sources. Additionally, it may be useful to focus future research on how accessing references affected preparedness leading up to the test day as well as response patterns during the test. For instance, do physicians tend to access certain references for particular items? If so, this may suggest that certain concepts are less relevant in actual practice, are reflective of common areas in which physicians struggle, or perhaps represent newer competencies that have only recently been taught at the school level.
A limitation of our study is that the data come from a medical subspecialty MOC where the resources may be more focused. Thus, for a generalist MOC with a much broader scope of practice, the reference sources would be much more extensive and perhaps less useful without greatly increasing the time limit. Another limitation was that the extent to which references were accessed was self-reported and unmonitored. Thus, it would be valuable to know what specific references physicians are accessing and for what types of questions to isolate knowledge deficits and preferences for gaining new knowledge.
The findings from this study have broad applications for stakeholders interested in improving MOC assessment. Permitting references positively affects scores but also alters the construct assessed by reducing the amount of recall required by physicians and introducing skills such as the ability to identify accurate sources and manage time. Given this impact to the construct, serious consideration must go into the decision regarding whether allowing references best aligns with the intended inferences made from the scores produced by the examination. Strong test construction principles, such as those offered in Standards for Educational and Psychological Testing,16 can help guide this decision. Specifically, practice analyses that provide the information needed to align test content with clinical practice would be useful in qualifying whether an open- or a closed-book test is most appropriate for a particular MOC program. For instance, accessing references may seem unequivocally more clinically relevant, but certain skills may be better measured through closed-book questions, such as emergency situations in which a physician would lack the time to look up the answer. It should be emphasized, however, that the stakes associated with MOC, for both physicians and the public at large, demand that these test design decisions be based on sound validity evidence grounded in science and not simply guided by public pressure.