Secondary Logo

Share this article on:

The Predictive Validity of the National Board of Osteopathic Medical Examiners’ COMLEX-USA Examinations With Regard to Outcomes on American Board of Family Medicine Examinations

O’Neill, Thomas R. PhD; Peabody, Michael R. PhD; Song, Hao PhD

doi: 10.1097/ACM.0000000000001254
Research Reports

Purpose To examine the predictive validity of the National Board of Osteopathic Medical Examiners’ Comprehensive Osteopathic Medical Licensing Examination of the United States of America (COMLEX-USA) series with regard to the American Board of Family Medicine’s (ABFM’s) In-Training Examination (ITE) and Maintenance of Certification for Family Physicians (MC-FP) Examination.

Method A repeated-measures design was employed, using test scores across seven levels of training for 1,023 DOs who took the MC-FP for the first time between April 2012 and November 2014 and for whom the ABFM had ITE scores for each of their residency years. Pearson and disattenuated correlations were calculated; Fisher r to z transformation was performed; and sensitivity, specificity, and positive and negative predictive values for the COMLEX-USA Level 2–Cognitive Evaluation (CE) with regard to the MC-FP were computed.

Results The Pearson and disattenuated correlations ranged from 0.55 to 0.69 and from 0.61 to 0.80, respectively. For MC-FP scores, only the correlation increase from the COMLEX-USA Level 2-CE to Level 3 was statistically significant (for Pearson correlations: z = 2.41, P = .008; for disattenuated correlations: z = 3.16, P < .001). The sensitivity, specificity, and positive and negative predictive values of the COMLEX-USA Level 2-CE with the MC-FP were 0.90, 0.39, 0.96, and 0.19, respectively.

Conclusions Evidence was found that the COMLEX-USA can assist family medicine residency program directors in predicting later resident performance on the ABFM’s ITE and MC-FP, which is becoming increasingly important as graduate medical education accreditation moves toward a single aligned model.

T.R. O’Neill is vice president of psychometric services, American Board of Family Medicine, Lexington, Kentucky.

M.R. Peabody is a psychometrician, American Board of Family Medicine, Lexington, Kentucky.

H. Song is senior director for psychometrics and research, National Board of Osteopathic Medical Examiners, Chicago, Illinois.

Funding/Support: None reported.

Other disclosures: All of the authors are employees of the American Board of Family Medicine or National Board of Osteopathic Medical Examiners.

Ethical approval: This study was deemed exempt by the American Academy of Family Physicians institutional review board.

Correspondence should be addressed to Thomas R. O’Neill, 1648 McGrathiana Pkwy., Suite 550, Lexington, KY 40511-1247; telephone: (859) 269-5626, ext. 1225; e-mail: toneill@theabfm.org.

The National Board of Osteopathic Medical Examiners (NBOME) has a three-level licensing examination series—the Comprehensive Osteopathic Medical Licensing Examination of the United States of America (COMLEX-USA)—which is used by state licensing boards as a prerequisite for an osteopathic medical license. The primary purpose of this examination series is to determine whether the examinee can demonstrate at least the minimal competence in medical knowledge and clinical skills required for unsupervised, general osteopathic medical practice. Although the primary consumers of licensing examination results are the state medical licensing boards, many residency programs use these test scores as part of their admissions and selection criteria.

Given this common practice, it would be useful to know the degree to which COMLEX-USA scores accurately predict physician test performance during residency training and on the certification examination. For this purpose, the NBOME and American Board of Family Medicine (ABFM) have collaborated to create a combined NBOME-ABFM data set that could help to answer these questions in a manner that does not violate either organization’s privacy policies. The purpose of this report is to address issues of predictive validity with regard to how well COMLEX-USA scores predict ABFM In-Training Examination (ITE) scores in each year of residency and scores on the ABFM Maintenance of Certification for Family Physicians (MC-FP) Examination.

Back to Top | Article Outline

Background

Because the specialty of family medicine is regarded as a broad-spectrum specialty and the COMLEX-USA is designed to measure unsupervised entry-level practice for a generalist physician, it seems reasonable to suppose that there are similarities between the construct embodied in the COMLEX-USA examinations and the construct of family medicine as embodied in the ABFM’s ITE and MC-FP Examination.

Predictive validity is the extent to which a test score predicts the outcome on some criterion measure at some future point in time. This criterion measure could be performance on a different test, later job performance, or perhaps something else. Predictive validity is important in medical education because medical education is usually viewed as a progression. To illustrate, a rigorous focus on biology courses is usually required to get into medical school. The Medical College Admissions Test (MCAT) is a way to compare medical school applicants in those areas using a standardized frame of reference. Similarly, to practice medicine, medical school graduates have to take a series of licensing examinations, such as the COMLEX-USA or the United States Medical Licensing Examination (USMLE). Applicants’ results on these examinations are frequently used as part of the admissions and selection criteria for residency programs. These examinations help to provide a common frame of reference for comparisons of graduates across the spectrum of medical schools.

Research examining the predictive validity of the MCAT tends to focus on the relationship between MCAT scores and subsequent scores on the USMLE examinations.1–5 More recent research has examined the ability of MCAT scores to predict medical school graduation.6 There have also been predictive validity studies that demonstrate the predictive power of the USMLE with several relevant outcomes such as residency program admission,7 resident ITE scores,8–10 residency training completion,11 and eventual board certification.12–15 Similarly, there have been studies that demonstrate the predictive power of the COMLEX-USA with several relevant outcomes such as resident ITE scores,16–18 USMLE scores,19,20 and eventual board certification.21

The first ITE was developed by the American Board of Neurological Surgery (ABNS) to address high failure rates on their certification examination.22,23 They hoped to identify at-risk residents early enough in the process that the issues could be addressed prior to taking the certification examination. The value of the ABNS ITE was largely in its predictive power with regard to the certification examination. Since that time, medical-specialty-specific ITEs have become much more prevalent, as have the predictive validity studies that demonstrate their utility to residents and program directors. In addition to studies conducted on the ABNS ITE, there have been studies that predicted certification examination results for the ABFM,24–26 the American Board of Surgery,27–30 the American Board of Internal Medicine,31–34 the American Board of Psychiatry and Neurology,35–37 the American Board of Radiology,38,39 the American Board of Pediatrics,15,40 the American Board of Obstetrics and Gynecology,41,42 the American Board of Anesthesiology,43 the American Board of Orthopaedic Surgery,44 and the American Board of Pathology.45 Internationally, the relationship between the Korean Academy of Family Medicine’s ITE and their certifying examination has been examined.46

To our knowledge, there are only two studies examining the predictive validity of licensing examinations with regard to ITEs and certifying examinations. McCaskill et al15 explored the relationship between the USMLE and the American Board of Pediatrics’ ITE and certification examination scores within a single pediatric residency program. Cavalieri et al17 examined how well the COMLEX-USA written examination scores predicted osteopathic internal medicine ITE scores and board certification scores. To add to this literature, we examined the predictive validity of the NBOME’s COMLEX-USA series with regard to the ABFM’s ITE and MC-FP Examination.

Back to Top | Article Outline

Method

Instrumentation

NBOME’s COMLEX-USA examinations.

The COMLEX-USA is an examination series with three levels: the COMLEX-USA Level 1; Level 2–Cognitive Evaluation (CE) (there is also the Level 2–Performance Evaluation, but scores from this examination were not used in this study); and Level 3. Each level of the COMLEX-USA is administered year-round in a standardized, time-measured environment and consists of 400 multiple-choice questions. These questions are scored as right or wrong using the dichotomous Rasch model47 and converted to scaled scores that range from 9 to 999. In conjunction with a common-item-equating design, the Rasch model is also used to equate examinations across test forms and years of administration. The passing standard is periodically reevaluated, and the scale is adjusted at that time to make the minimum passing score a standard score of 400 for the COMLEX-USA Level 1 and Level 2-CE and 350 for the COMLEX-USA Level 3.

Back to Top | Article Outline

ABFM’s ITE.

The ABFM’s ITE is designed to provide family medicine residents with an experience similar to the ABFM’s certification examination, the MC-FP Examination, and to provide both the resident and his or her program with feedback regarding his or her progress toward becoming board certified. The ITE contains 240 multiple-choice items, which are scored as right or wrong, and is built to the same specifications as the core portion (i.e., the nonmodule portion) of the MC-FP Examination. ITE scores are reported on the same 200–800 range as the MC-FP Examination. Every year, there is a new form of the ITE with no items in common with the previous forms of the ITE. To place ITE scores from different administrations on the same scale as the MC-FP Examination, the ABFM includes a small number of ITE questions as unscored pretest questions on the MC-FP Examination. These questions are then calibrated onto the MC-FP Examination scale using the dichotomous Rasch model.47 These ITE questions and their associated calibrations are used to connect each administration of the ITE to the continuously maintained MC-FP Examination scale. Because the ITE has been equated onto the MC-FP Examination scale and built to similar specifications, examinees’ ITE scores should be highly correlated with the scores they would have earned on the MC-FP Examination had they taken it instead of the ITE.

Back to Top | Article Outline

ABFM’s MC-FP Examination.

The ABFM’s MC-FP Examination measures physicians’ clinical decision-making ability as it relates to family medicine. Passing this examination is one of the requirements for ABFM certification. The examination is administered during the months of April and November of each year. It consists of a common core of 260 multiple-choice questions plus two examinee-selected modules of 45 questions each from a menu of eight content-specific modules (e.g., geriatrics, maternity care). These 350 items are scored as right or wrong using the dichotomous Rasch model,47 and the resulting ability estimates are converted to scaled scores that range from 200 to 800. In conjunction with a common-item-equating design, the Rasch model is also used to equate examinations across test forms and years of administration. The use of a common scale with a passing standard that is held constant for useful periods of time has the advantage of providing a more stable target for making predictions related to whether a particular candidate will pass or fail. During the time frame from which the data in this study were gathered (see below), the minimum passing score for the MC-FP Examination was 390. (The process used to develop the content specifications for this examination is described in greater detail by Norris et al.48)

Back to Top | Article Outline

Examination timing and participants

The COMLEX-USA Level 1 is usually completed after the second year of osteopathic medical school, and the COMLEX-USA Level 2-CE is usually completed during the third or fourth year of osteopathic medical school. The COMLEX-USA Level 3 can only be taken after the DO degree has been conferred,49 typically during the first year of residency. The ABFM administers the ITE to nearly all family medicine residents in Accreditation Council for Graduate Medical Education (ACGME)–accredited programs once during each year of the residency program, but not to residents enrolled in programs that are accredited solely by the American Osteopathic Association. Approximately 20% of the physicians who sit for the ABFM ITE hold a DO degree, as do 15% of physicians sitting for the MC-FP Examination for the first time.

The ABFM initially identified 1,065 physicians who sat for the MC-FP Examination for the first time between April 2012 and November 2014, who also held a DO degree, and for whom the ABFM had ITE scores for each of their years of residency. For physicians who had to retake the MC-FP Examination, only the score from their first attempt was included in this study. There were 42 osteopathic physicians from the ABFM data set who did not have a matching record in the NBOME data set, bringing the final number of participants to 1,023. The NBOME was able to provide the first-attempt COMLEX-USA Level 1, Level 2-CE, and Level 3 scores for these 1,023 participants (although 1 participant was missing a COMLEX-USA Level 3 score; see Table 1). In June 2015, a final merged data set was created and stripped of all demographic information to ensure that neither organization could reidentify individual records.

Table 1

Table 1

Back to Top | Article Outline

Procedures

In this study, we employed a repeated-measures design using test scores across seven levels of medical training (i.e., the COMLEX-USA Levels 1–3, ABFM’s ITE for postgraduate year [PGY] 1 to PGY3, and first attempt of the ABFM’s MC-FP Examination). The correlational analyses included generating Pearson correlations to assess how well the scores were related and disattenuated correlations to see how similar the constructs implied by the tests were after adjusting for the unreliability of the tests.50–52 When significance tests are performed on correlations, they typically test whether the correlation is significantly different from zero, but in predictive validity studies it is almost always significant. We performed the significance test, but also performed Fisher r to z transformation53 to test whether one correlation was significantly higher than another.

We conducted an additional analysis to determine the sensitivity and specificity of different COMLEX-USA Level 2-CE scores with regard to predicting a passing score on the MC-FP Examination. The COMLEX-USA Level 2-CE was selected because it is typically available when medical students apply to residencies, while the COMLEX-USA Level 3 score is not always available at that point in time. We identified a reasonable trade-off point between sensitivity and specificity and computed the specificity and sensitivity, as well as the positive and negative predictive values of different COMLEX-USA Level 2-CE scores with regard to predicting a passing score on the MC-FP Examination.34,54

The procedures used in this study were reviewed by senior ABFM and NBOME executive staff to ensure that the organizations’ privacy policies were not being violated. In addition, the ABFM portion of the data was deemed exempt by the American Academy of Family Physicians institutional review board. The NBOME obtains prior approval from examinees to use COMLEX-USA scores for research on an aggregated basis.

Back to Top | Article Outline

Results

Pearson correlations

The Pearson correlation matrix (Table 1) shows that each of these tests is correlated with the other tests to a statistically significant degree. Correlations ranged from r = 0.55 for the COMLEX-USA Level 1 with both the ITE PGY2 and ITE PGY3 to r = 0.69 for the COMLEX-USA Level 3 with the MC-FP Examination. With regard to the strength of the relationship with MC-FP Examination scores, the COMLEX-USA score correlations increased the closer an examination was to the administration of the MC-FP Examination; however, only the increase in correlation from the COMLEX-USA Level 2-CE to COMLEX-USA Level 3 was statistically significant (z = 2.41, P = .008, one tailed).

Back to Top | Article Outline

Disattenuated correlations

The disattenuated correlation matrix (Table 2) shows that the construct underlying all of the examinations had a high degree of similarity55,56 and that all of the correlations were statistically significant. Because these examinations were all highly reliable, disattenuating for the unreliability of the scores only modestly increased the correlation coefficients. The disattenuated correlations ranged from r = 0.61 for the COMLEX-USA Level 1 with both the ITE PGY2 and ITE PGY3 to r = 0.80 for the ITE PGY1 with the ITE PGY2. However, with regard to examinations correlating with MC-FP Examination scores, only the increase in correlation from the COMLEX-USA Level 2 to COMLEX-USA Level 3 was statistically significant (z = 3.16, P < .001, one tailed).

Table 2

Table 2

Back to Top | Article Outline

Sensitivity, specificity, and positive and negative predictive values

Using minimum passing scores of 380 for the MC-FP Examination and 400 for the COMLEX-USA Level 2-CE, we computed the sensitivity, specificity, positive predictive value, and negative predictive value (Chart 1). The sensitivity—the proportion of actual MC-FP Examination passers who were also predicted to pass on the basis of their COMLEX-USA Level 2-CE score—was 0.90. The specificity—the proportion of actual MC-FP Examination failers who were also predicted to fail on the basis of their COMLEX-USA Level 2-CE score—was 0.39. The positive predictive value—the proportion of people who were predicted to pass the MC-FP Examination on the basis of their COMLEX-USA Level 2-CE score and who actually passed—was 0.96. The negative predictive value—the proportion of people who were predicted to fail the MC-FP Examination on the basis of their COMLEX-USA Level 2-CE score and who actually failed—was 0.19. Additionally, Figure 1 shows the trade-off between sensitivity and specificity using different prediction thresholds.

Figure 1

Figure 1

Chart 1 Ability of COMLEX-USA Level 2-CE Scores to Predict MC-FP Examination Resultsa

Chart 1 Ability of COMLEX-USA Level 2-CE Scores to Predict MC-FP Examination Resultsa

Back to Top | Article Outline

Discussion

This study provides evidence that the COMLEX-USA can assist family medicine residency program directors in predicting later resident performance on the ABFM’s ITE and MC-FP Examination.

Back to Top | Article Outline

Correlation of examination scores

In a previous study, we found that the Pearson correlation of ITE scores with ITE scores from one year later or with MC-FP Examination scores six months later was typically about 0.70.26 The data in this study closely replicate these findings (see Table 1). This indicates that ITE scores can be used as reasonably good predictors of future performance on later ITEs and the MC-FP Examination. The Pearson correlation, rather than the disattenuated correlation, is the appropriate correlation for making predictions about individuals because it includes both differences in the dimensionality across test forms and the degree of unreliability associated with each test form. It is interesting to note that the correlation of COMLEX-USA Level 3 scores with MC-FP Examination scores was higher than that of the ITE PGY3 scores with MC-FP Examination scores (see Table 1), but not to a statistically significant degree; however, much of this difference is eliminated after disattenuating for the unreliability of both examinations (see Table 2). It is important to note that the COMLEX-USA Level 3 has 110 more questions than the ITE which usually increases reliability estimates.

The Rasch reliability of the COMLEX-USA examinations is generally about 0.94. The Rasch reliability of the ITE and the MC-FP Examination is typically 0.84 and 0.94, respectively.26 To assess the extent to which two test forms are measuring the same dimension, the correlation must be disattenuated for the degree of unreliability associated with both test forms.50–52,56 The disattenuated correlations across examinations ranged from 0.61 to 0.80 (see Table 2), which demonstrates that the construct across these test forms is quite similar, but not perfectly identical. It should be noted the MC-FP Examination has two examinee-selected 45-item modules that the COMLEX-USA and ITE do not have.

It is interesting to note that the Pearson correlations in this study between the ITE PGY1 and ITE PGY2, the ITE PGY2 and ITE PGY3, and the ITE PGY3 and MC-FP Examination are lower than those previously reported by O’Neill et al26 (see Table 1). In that study, the correlations were 0.69, 0.70, and 0.71, respectively; however, the sample sizes were much larger, the spread of scores was smaller (the standard deviations were 58, 57, and 58, respectively), and the samples included both osteopathic and allopathic physicians. For this reason, it seems likely that the correlations may be lower in the current sample because of a restriction of range issue.

Back to Top | Article Outline

Sensitivity, specificity, and predictive power of the COMLEX-USA

Because we had scores from the COMLEX-USA, ITE, and MC-FP Examination, we could evaluate the accuracy of the COMLEX-USA predictions by way of the MC-FP Examination pass–fail status, or in other words, the sensitivity and specificity of the COMLEX-USA. The sensitivity of the COMLEX-USA Level 2-CE was 0.90, meaning that of those who passed the MC-FP Examination, 90% had previously passed the COMLEX-USA Level 2-CE on their first attempt. The specificity of the COMLEX-USA Level 2-CE was 0.39, meaning that of those who failed the MC-FP Examination, 39% had previously failed the COMLEX-USA Level 2-CE on their first attempt. It should be noted that the low specificity is actually good news for physicians; it means that a low score on the COMLEX-USA Level 2-CE does not doom a physician to fail the MC-FP Examination; they can take some corrective action and improve their future test performance. A review of Figure 1 shows that there is a trade-off between false positives and false negatives depending on where the prediction threshold is set. Lowering the prediction threshold below the COMLEX-USA Level 2-CE passing standard of 400 will not increase the proportion of true positives by very much, but it will significantly drop the number of true negatives. If the prediction threshold was set at 500, it would significantly increase the number of true negatives, but it would also significantly decrease the number of true positives. Weighting sensitivity and specificity equally, the optimal compromise point between sensitivity and specificity seems to be a COMLEX-USA Level 2-CE score of 447 (see Figure 1). A chart, similar to Figure 1, examining positive and negative predictive values was not included because the row totals (i.e., the denominators in the predictive values equations; see Chart 1) change depending on where the prediction threshold is set. The resulting chart would also not represent a monotonically increasing (or decreasing) function, making the interpretation complicated because the number of observations would change at each condition level.

Back to Top | Article Outline

Limitations

Our results are specific to the examinations we used in this study and cannot be expected to generalize to other specialties or even other family medicine certification examinations offered by other organizations. Our results also only generalize to osteopathic physicians who attended ACGME-accredited family medicine residency programs and took the MC-FP Examination. It is not necessarily predictive of how osteopathic physicians who had residencies in other specialties or who attended American Osteopathic Association–accredited family medicine residency programs would have done, had they taken the MC-FP Examination. Another limitation is that the examinee demographics were stripped from the final data set for privacy purposes. Regrettably, this also prevents us from providing a richer description of the sample. Lastly, our study was limited to the COMLEX-USA Level 1, Level 2-CE, and Level 3 scores; it would be interesting to include scores from the COMLEX-USA Level 2-Performance Evaluation, the clinical skills component of the COMLEX-USA, in future analyses.

For the COMLEX-USA to have a high degree of predictive validity, it is necessary for it to be highly correlated with ITE and MC-FP Examination scores. In addition to being correlated, the scores must also be on a common and stable scale across administrations so that COMLEX-USA scores can be used to make a prediction about future test performance. At first, it might seem troubling that COMLEX-USA scores are rescaled after each revision of the passing standard; however, the rescaling only effects the standard scores that are reported to examinees. The underlying logit57 scale from which the standard scores are derived is not rescaled; only the formula used to transform the logit scale into standard scores is changed. This permits psychometricians to make score comparisons regardless of changes to the rescaling formula.

Back to Top | Article Outline

Conclusions

Neither program directors58,59 nor residents60 are good at predicting residents’ examination scores. To this end, ITEs are designed to connect training to certification by allowing program directors to assess residents’ progress toward certification throughout residency and address deficiencies along the way. This low-stakes, educational assessment aspect of ITEs has been the primary consideration in their design and implementation. However, the ability of program directors to have a useful predictor of future examination performance at the time of admission is becoming increasingly important as graduate medical education accreditation is moving to a single aligned model61 and accreditation status is partially contingent on both a program’s certification examination take rate and first-time takers’ pass rate. This study demonstrates that COMLEX-USA scores can be a useful predictor of future ABFM ITE and MC-FP Examination scores and can provide an early glimpse into a prospective resident’s probability of successfully completing residency and passing their certification examination.

Acknowledgments: The authors would like to thank John Gimpel, DO, MEd, and Bruce Bates, DO, CMD, for their comments on previous versions of this manuscript.

Back to Top | Article Outline

References

1. Callahan CA, Hojat M, Veloski J, Erdmann JB, Gonnella JS. The predictive validity of three versions of the MCAT in relation to performance in medical school, residency, and licensing examinations: A longitudinal study of 36 classes of Jefferson Medical College. Acad Med. 2010;85:980–987.
2. Donnon T, Paolucci EO, Violato C. The predictive validity of the MCAT for medical school performance and medical board licensing examinations: A meta-analysis of the published research. Acad Med. 2007;82:100–106.
3. Julian ER. Validity of the Medical College Admission Test for predicting medical school performance. Acad Med. 2005;80:910–917.
4. Violato C, Donnon T. Does the medical college admission test predict clinical reasoning skills? A longitudinal study employing the Medical Council of Canada clinical reasoning examination. Acad Med. 2005;80(10 Suppl):S14–S16.
5. Kleshinski J, Khuder SA, Shapiro JI, Gold JP. Impact of preadmission variables on USMLE step 1 and step 2 performance. Adv Health Sci Educ Theory Pract. 2009;14:69–78.
6. Dunleavy DM, Kroopnick MH, Dowd KW, Searcy CA, Zhao X. The predictive validity of the MCAT exam in relation to academic performance through medical school: A national cohort study of 2001–2004 matriculants. Acad Med. 2013;88:666–671.
7. de Oliveira GS Jr, Akikwala T, Kendall MC, et al. Factors affecting admission to anesthesiology residency in the United States: Choosing the future of our specialty. Anesthesiology. 2012;117:243–251.
8. McDonald FS, Zeger SL, Kolars JC. Associations between United States Medical Licensing Examination (USMLE) and Internal Medicine In-Training Examination (IM-ITE) scores. J Gen Intern Med. 2008;23:1016–1019.
9. Fening K, Vander Horst A, Zirwas M. Correlation of USMLE Step 1 scores with performance on dermatology in-training examinations. J Am Acad Dermatol. 2011;64:102–106.
10. Thundiyil JG, Modica RF, Silvestri S, Papa L. Do United States Medical Licensing Examination (USMLE) scores predict in-training test performance for emergency medicine residents? J Emerg Med. 2010;38:65–69.
11. Alterman DM, Jones TM, Heidel RE, Daley BJ, Goldman MH. The predictive value of general surgery application data for future resident performance. J Surg Educ. 2011;68:513–518.
12. Jeffe DB, Andriole DA. Factors associated with American Board of Medical Specialties member board certification among US medical school graduates. JAMA. 2011;306:961–970.
13. Sutton E, Richardson JD, Ziegler C, Bond J, Burke-Poole M, McMasters KM. Is USMLE Step 1 score a valid predictor of success in surgical residency? Am J Surg. 2014;208:1029–1034.
14. Swanson DB, Sawhill A, Holtzman KZ, et al. Relationship between performance on part I of the American Board of Orthopaedic Surgery certifying examination and scores on USMLE Steps 1 and 2. Acad Med. 2009;84(10 Suppl):S21–S24.
15. McCaskill QE, Kirk JJ, Barata DM, Wludyka PS, Zenni EA, Chiu TT. USMLE step 1 scores as a significant predictor of future board passage in pediatrics. Ambul Pediatr. 2007;7:192–195.
16. Sevensma SC, Navarre G, Richards RK. COMLEX-USA and in-service examination scores: Tools for evaluating medical knowledge among residents. J Am Osteopath Assoc. 2008;108:713–716.
17. Cavalieri TA, Shen L, Slick GL. Predictive validity of osteopathic medical licensing examinations for osteopathic medical knowledge measured by graduate written examinations. J Am Osteopath Assoc. 2003;103:337–342.
18. Pierce DL. Performance on COMLEX-USA exams predicts performance on EM residency in-training exams. Acad Emerg Med. 2013;20:S219–S220.
19. Sarko J, Svoren E, Katz E. COMLEX-1 and USMLE-1 are not interchangeable examinations. Acad Emerg Med. 2010;17:218–220.
20. Chick DA, Friedman HP, Young VB, Solomon D. Relationship between COMLEX and USMLE scores among osteopathic medical students who take both examinations. Teach Learn Med. 2010;22:3–7.
21. Li F, Gimpel JR, Arenson E, Song H, Bates BP, Ludwin F. Relationship between COMLEX-USA scores and performance on the American Osteopathic Board of Emergency Medicine part I certifying examination. J Am Osteopath Assoc. 2014;114:260–266.
22. Hubbard JP, Furlow LT, Matson DD. An in-training examination for residents as a guide to learning. N Engl J Med. 1967;276:448–451.
23. Hubbard JP, Levit EJ. The National Board of Medical Examiners: The First Seventy Years: A Continuing Commitment to Excellence. 1985.Philadelphia, Pa: National Board of Medical Examiners.
24. Leigh TM, Johnson TP, Pisacano NJ. Predictive validity of the American Board of Family Practice in-training examination. Acad Med. 1990;65:454–457.
25. Replogle WH, Johnson WD. Assessing the predictive value of the American Board of Family Practice in-training examination. Fam Med. 2004;36:185–188.
26. O’Neill TR, Li Z, Peabody MR, Lybarger M, Royal K, Puffer JC. The predictive validity of the ABFM’s in-training examination. Fam Med. 2015;47:349–356.
27. Garvin PJ, Kaminski DL. Significance of the in-training examination in a surgical residency program. Surgery. 1984;96:109–113.
28. Biester TW. A study of the relationship between a medical certification examination and an in-training examination. Paper presented at: Annual Meeting of the American Educational Research Association; March 31–April 4, 1985; Chicago, Ill.
29. Biester TW. The American Board of Surgery in-training examination as a predictor of success on the qualifying examination. Curr Surg. 1987;44:194–198.
30. Jones AT, Biester TW, Buyske J, Lewis FR, Malangoni MA. Using the American Board of Surgery in-training examination to predict board certification: A cautionary study. J Surg Educ. 2014;71:144–148.
31. Grossman RS, Fincher RM, Layne RD, Seelig CB, Berkowitz LR, Levine MA. Validity of the in-training examination for predicting American Board of Internal Medicine certifying examination scores. J Gen Intern Med. 1992;7:63–67.
32. Waxman H, Braunstein G, Dantzker D, et al. Performance on the internal medicine second-year residency in-training examination predicts the outcome of the ABIM certifying examination. J Gen Intern Med. 1994;9:692–694.
33. Rollins LK, Martindale JR, Edmond M, Manser T, Scheld WM. Predicting pass rates on the American Board of Internal Medicine certifying examination. J Gen Intern Med. 1998;13:414–416.
34. Babbott SF, Beasley BW, Hinchey KT, Blotzer JW, Holmboe ES. The predictive validity of the internal medicine in-training examination. Am J Med. 2007;120:735–740.
35. Webb LC, Juul D, Reynolds CF 3rd, et al. How well does the psychiatry residency in-training examination predict performance on the American Board of Psychiatry and Neurology. Part I. Examination? Am J Psychiatry. 1996;153:831–832.
36. Goodman JC, Juul D, Westmoreland B, Burns R. RITE performance predicts outcome on the ABPN part I examination. Neurology. 2002;58:1144–1146.
37. Juul D, Schneidman BS, Sexson SB, et al. Relationship between resident-in-training examination in psychiatry and subsequent certification examination performances. Acad Psychiatry. 2009;33:404–406.
38. Baumgartner BR, Peterman SB. 1998 Joseph E. Whitley, MD, Award. Relationship between American College of Radiology in-training examination scores and American Board of Radiology written examination scores. Part 2. Multi-institutional study. Acad Radiol. 1998;5:374–379.
39. Baumgartner BR, Peterman SB. Relationship between American College of Radiology in-training examination scores and American Board of Radiology written examination scores. Acad Radiol. 1996;3:873–878.
40. Althouse LA, McGuinness GA. The in-training examination: An analysis of its predictive value on performance on the general pediatrics certification examination. J Pediatr. 2008;153:425–428.
41. Spellacy WN, Carlan SJ, McCarthy JM, Tsibris JC. Prediction of ABOG written examination performance from the third-year CREOG in-training examination results. J Reprod Med. 2006;51:621–622.
42. Withiam-Leitch M, Olawaiye A. Resident performance on the in-training and board examinations in obstetrics and gynecology: Implications for the ACGME Outcome Project. Teach Learn Med. 2008;20:136–142.
43. Kearney RA, Sullivan P, Skakun E. Performance on ABA-ASA in-training examination predicts success for RCPSC certification. American Board of Anesthesiology–American Society of Anesthesiologists. Royal College of Physicians and Surgeons of Canada. Can J Anaesth. 2000;47:914–918.
44. Klein GR, Austin MS, Randolph S, Sharkey PF, Hilibrand AS. Passing the boards: Can USMLE and orthopaedic in-training examination scores predict passage of the ABOS part-I examination? J Bone Joint Surg Am. 2004;86-A:1092–1095.
45. Rinder HM, Grimes MM, Wagner J, Bennett BD; RISE Committee, American Society for Clinical Pathology and the American Board of Pathology. Senior pathology resident in-service examination scores correlate with outcomes of the American Board of Pathology certifying examinations. Am J Clin Pathol. 2011;136:499–506.
46. Cho JJ, Kim JY. Predictive value of the Korean Academy of Family Medicine in-training examination for certifying examination. Korean J Fam Med. 2011;32:352–357.
47. Rasch G. Probabilistic Models for Some Intelligence and Attainment Tests. 1960.Copenhagen, Denmark: Danish Institute for Educational Research.
48. Norris TE, Rovinelli RJ, Puffer JC, Rinaldo J, Price DW. From specialty-based to practice-based: A new blueprint for the American Board of Family Medicine cognitive examination. J Am Board Fam Pract. 2005;18:546–554.
49. National Board of Osteopathic Medical Examiners. COMLEX-USA Bulletin of Information, 2015–2016. 2015. Chicago, Ill: National Board of Osteopathic Medical Examiners; http://www.nbome.org/comlexBOI.pdf. Accesed March 31, 2016.
50. Spearman C. The proof and measurement of association between two things. Am J Psychol. 1904;15:72–101.
51. Zimmerman DW, Williams RH. Properties of the Spearman correction for attenuation for normal and realistic non-normal distributions. Appl Psychol Meas. 1997;21:253–270.
52. Schumaker RE, Muchinsky PM. Disattenuating correlation coefficients. Rasch Meas Trans. 1996;10:479.
53. Fisher RA. Frequency distribution of the values of the correlation coefficient in samples from an indefinitely large population. Biometrika. 1915;10:507–521.
54. Weinstein MC, Fineberg HV. Clinical Decision Analysis. 1980.Philadelphia, Pa: Saunders.
55. Cohen J. Statistical Power Analysis for the Behavioral Sciences. 1988.2nd ed. Hillsdale, NJ: Erlbaum.
56. Guilford JP. Fundamental Statistics in Psychology and Education. 1978.6th ed. New York, NY: McGraw Hill.
57. Linacre JM, Wright BD. The “length” of a logit. Rasch Meas Trans. 1989;3:54–55.
58. Hawkins RE, Sumption KF, Gaglione MM, Holmboe ES. The in-training examination in internal medicine: Resident perceptions and lack of correlation between resident scores and faculty predictions of resident performance. Am J Med. 1999;106:206–210.
59. Taylor C, Lipsky MS. A study of the ability of physician faculty members to predict resident performance. Fam Med. 1990;22:296–298.
60. Parker RW, Alford C, Passmore C. Can family medicine residents predict their performance on the in-training examination? Fam Med. 2004;36:705–709.
61. Accreditation Council for Graduate Medical Education; American Osteopathic Association; American Association of Colleges of Osteopathic Medicine. Allopathic and osteopathic medical communities commit to a single graduate medical education accreditation system [press release]. February 26, 2014. http://www.osteopathic.org/inside-aoa/news-and-publications/media-center/2014-news-releases/pages/2-26-allopathic-and-osteopathic-medical-communities-commit-to-single-graduate-medical-education-accreditation-system.aspx. Accessed March 31, 2016.
© 2016 by the Association of American Medical Colleges