This study provides evidence that the COMLEX-USA can assist family medicine residency program directors in predicting later resident performance on the ABFM’s ITE and MC-FP Examination.
In a previous study, we found that the Pearson correlation of ITE scores with ITE scores from one year later or with MC-FP Examination scores six months later was typically about 0.70.26 The data in this study closely replicate these findings (see Table 1). This indicates that ITE scores can be used as reasonably good predictors of future performance on later ITEs and the MC-FP Examination. The Pearson correlation, rather than the disattenuated correlation, is the appropriate correlation for making predictions about individuals because it includes both differences in the dimensionality across test forms and the degree of unreliability associated with each test form. It is interesting to note that the correlation of COMLEX-USA Level 3 scores with MC-FP Examination scores was higher than that of the ITE PGY3 scores with MC-FP Examination scores (see Table 1), but not to a statistically significant degree; however, much of this difference is eliminated after disattenuating for the unreliability of both examinations (see Table 2). It is important to note that the COMLEX-USA Level 3 has 110 more questions than the ITE which usually increases reliability estimates.
The Rasch reliability of the COMLEX-USA examinations is generally about 0.94. The Rasch reliability of the ITE and the MC-FP Examination is typically 0.84 and 0.94, respectively.26 To assess the extent to which two test forms are measuring the same dimension, the correlation must be disattenuated for the degree of unreliability associated with both test forms.50–52,56 The disattenuated correlations across examinations ranged from 0.61 to 0.80 (see Table 2), which demonstrates that the construct across these test forms is quite similar, but not perfectly identical. It should be noted the MC-FP Examination has two examinee-selected 45-item modules that the COMLEX-USA and ITE do not have.
It is interesting to note that the Pearson correlations in this study between the ITE PGY1 and ITE PGY2, the ITE PGY2 and ITE PGY3, and the ITE PGY3 and MC-FP Examination are lower than those previously reported by O’Neill et al26 (see Table 1). In that study, the correlations were 0.69, 0.70, and 0.71, respectively; however, the sample sizes were much larger, the spread of scores was smaller (the standard deviations were 58, 57, and 58, respectively), and the samples included both osteopathic and allopathic physicians. For this reason, it seems likely that the correlations may be lower in the current sample because of a restriction of range issue.
Because we had scores from the COMLEX-USA, ITE, and MC-FP Examination, we could evaluate the accuracy of the COMLEX-USA predictions by way of the MC-FP Examination pass–fail status, or in other words, the sensitivity and specificity of the COMLEX-USA. The sensitivity of the COMLEX-USA Level 2-CE was 0.90, meaning that of those who passed the MC-FP Examination, 90% had previously passed the COMLEX-USA Level 2-CE on their first attempt. The specificity of the COMLEX-USA Level 2-CE was 0.39, meaning that of those who failed the MC-FP Examination, 39% had previously failed the COMLEX-USA Level 2-CE on their first attempt. It should be noted that the low specificity is actually good news for physicians; it means that a low score on the COMLEX-USA Level 2-CE does not doom a physician to fail the MC-FP Examination; they can take some corrective action and improve their future test performance. A review of Figure 1 shows that there is a trade-off between false positives and false negatives depending on where the prediction threshold is set. Lowering the prediction threshold below the COMLEX-USA Level 2-CE passing standard of 400 will not increase the proportion of true positives by very much, but it will significantly drop the number of true negatives. If the prediction threshold was set at 500, it would significantly increase the number of true negatives, but it would also significantly decrease the number of true positives. Weighting sensitivity and specificity equally, the optimal compromise point between sensitivity and specificity seems to be a COMLEX-USA Level 2-CE score of 447 (see Figure 1). A chart, similar to Figure 1, examining positive and negative predictive values was not included because the row totals (i.e., the denominators in the predictive values equations; see Chart 1) change depending on where the prediction threshold is set. The resulting chart would also not represent a monotonically increasing (or decreasing) function, making the interpretation complicated because the number of observations would change at each condition level.
Our results are specific to the examinations we used in this study and cannot be expected to generalize to other specialties or even other family medicine certification examinations offered by other organizations. Our results also only generalize to osteopathic physicians who attended ACGME-accredited family medicine residency programs and took the MC-FP Examination. It is not necessarily predictive of how osteopathic physicians who had residencies in other specialties or who attended American Osteopathic Association–accredited family medicine residency programs would have done, had they taken the MC-FP Examination. Another limitation is that the examinee demographics were stripped from the final data set for privacy purposes. Regrettably, this also prevents us from providing a richer description of the sample. Lastly, our study was limited to the COMLEX-USA Level 1, Level 2-CE, and Level 3 scores; it would be interesting to include scores from the COMLEX-USA Level 2-Performance Evaluation, the clinical skills component of the COMLEX-USA, in future analyses.
For the COMLEX-USA to have a high degree of predictive validity, it is necessary for it to be highly correlated with ITE and MC-FP Examination scores. In addition to being correlated, the scores must also be on a common and stable scale across administrations so that COMLEX-USA scores can be used to make a prediction about future test performance. At first, it might seem troubling that COMLEX-USA scores are rescaled after each revision of the passing standard; however, the rescaling only effects the standard scores that are reported to examinees. The underlying logit57 scale from which the standard scores are derived is not rescaled; only the formula used to transform the logit scale into standard scores is changed. This permits psychometricians to make score comparisons regardless of changes to the rescaling formula.
1. Callahan CA, Hojat M, Veloski J, Erdmann JB, Gonnella JS. The predictive validity of three versions of the MCAT in relation to performance in medical school, residency, and licensing examinations: A longitudinal study of 36 classes of Jefferson Medical College. Acad Med. 2010;85:980–987.
2. Donnon T, Paolucci EO, Violato C. The predictive validity of the MCAT for medical school performance and medical board licensing examinations: A meta-analysis of the published research. Acad Med. 2007;82:100–106.
3. Julian ER. Validity of the Medical College Admission Test for predicting medical school performance. Acad Med. 2005;80:910–917.
4. Violato C, Donnon T. Does the medical college admission test predict clinical reasoning skills? A longitudinal study employing the Medical Council of Canada clinical reasoning examination. Acad Med. 2005;80(10 Suppl):S14–S16.
5. Kleshinski J, Khuder SA, Shapiro JI, Gold JP. Impact of preadmission variables on USMLE step 1 and step 2 performance. Adv Health Sci Educ Theory Pract. 2009;14:69–78.
6. Dunleavy DM, Kroopnick MH, Dowd KW, Searcy CA, Zhao X. The predictive validity of the MCAT exam in relation to academic performance through medical school: A national cohort study of 2001–2004 matriculants. Acad Med. 2013;88:666–671.
7. de Oliveira GS Jr, Akikwala T, Kendall MC, et al. Factors affecting admission to anesthesiology residency in the United States: Choosing the future of our specialty. Anesthesiology. 2012;117:243–251.
8. McDonald FS, Zeger SL, Kolars JC. Associations between United States Medical Licensing Examination (USMLE) and Internal Medicine In-Training Examination (IM-ITE) scores. J Gen Intern Med. 2008;23:1016–1019.
9. Fening K, Vander Horst A, Zirwas M. Correlation of USMLE Step 1 scores with performance on dermatology in-training examinations. J Am Acad Dermatol. 2011;64:102–106.
10. Thundiyil JG, Modica RF, Silvestri S, Papa L. Do United States Medical Licensing Examination (USMLE) scores predict in-training test performance for emergency medicine residents? J Emerg Med. 2010;38:65–69.
11. Alterman DM, Jones TM, Heidel RE, Daley BJ, Goldman MH. The predictive value of general surgery application data for future resident performance. J Surg Educ. 2011;68:513–518.
12. Jeffe DB, Andriole DA. Factors associated with American Board of Medical Specialties member board certification among US medical school graduates. JAMA. 2011;306:961–970.
13. Sutton E, Richardson JD, Ziegler C, Bond J, Burke-Poole M, McMasters KM. Is USMLE Step 1 score a valid predictor of success in surgical residency? Am J Surg. 2014;208:1029–1034.
14. Swanson DB, Sawhill A, Holtzman KZ, et al. Relationship between performance on part I of the American Board of Orthopaedic Surgery certifying examination and scores on USMLE Steps 1 and 2. Acad Med. 2009;84(10 Suppl):S21–S24.
15. McCaskill QE, Kirk JJ, Barata DM, Wludyka PS, Zenni EA, Chiu TT. USMLE step 1 scores as a significant predictor of future board passage in pediatrics. Ambul Pediatr. 2007;7:192–195.
16. Sevensma SC, Navarre G, Richards RK. COMLEX-USA and in-service examination scores: Tools for evaluating medical knowledge among residents. J Am Osteopath Assoc. 2008;108:713–716.
17. Cavalieri TA, Shen L, Slick GL. Predictive validity of osteopathic medical licensing examinations for osteopathic medical knowledge measured by graduate written examinations. J Am Osteopath Assoc. 2003;103:337–342.
18. Pierce DL. Performance on COMLEX-USA exams predicts performance on EM residency in-training exams. Acad Emerg Med. 2013;20:S219–S220.
19. Sarko J, Svoren E, Katz E. COMLEX-1 and USMLE-1 are not interchangeable examinations. Acad Emerg Med. 2010;17:218–220.
20. Chick DA, Friedman HP, Young VB, Solomon D. Relationship between COMLEX and USMLE scores among osteopathic medical students who take both examinations. Teach Learn Med. 2010;22:3–7.
21. Li F, Gimpel JR, Arenson E, Song H, Bates BP, Ludwin F. Relationship between COMLEX-USA scores and performance on the American Osteopathic Board of Emergency Medicine part I certifying examination. J Am Osteopath Assoc. 2014;114:260–266.
22. Hubbard JP, Furlow LT, Matson DD. An in-training examination for residents as a guide to learning. N Engl J Med. 1967;276:448–451.
23. Hubbard JP, Levit EJ. The National Board of Medical Examiners: The First Seventy Years: A Continuing Commitment to Excellence. 1985.Philadelphia, Pa: National Board of Medical Examiners.
24. Leigh TM, Johnson TP, Pisacano NJ. Predictive validity of the American Board of Family Practice in-training examination. Acad Med. 1990;65:454–457.
25. Replogle WH, Johnson WD. Assessing the predictive value of the American Board of Family Practice in-training examination. Fam Med. 2004;36:185–188.
26. O’Neill TR, Li Z, Peabody MR, Lybarger M, Royal K, Puffer JC. The predictive validity of the ABFM’s in-training examination. Fam Med. 2015;47:349–356.
27. Garvin PJ, Kaminski DL. Significance of the in-training examination in a surgical residency program. Surgery. 1984;96:109–113.
28. Biester TW. A study of the relationship between a medical certification examination and an in-training examination. Paper presented at: Annual Meeting of the American Educational Research Association; March 31–April 4, 1985; Chicago, Ill.
29. Biester TW. The American Board of Surgery in-training examination as a predictor of success on the qualifying examination. Curr Surg. 1987;44:194–198.
30. Jones AT, Biester TW, Buyske J, Lewis FR, Malangoni MA. Using the American Board of Surgery in-training examination to predict board certification: A cautionary study. J Surg Educ. 2014;71:144–148.
31. Grossman RS, Fincher RM, Layne RD, Seelig CB, Berkowitz LR, Levine MA. Validity of the in-training examination for predicting American Board of Internal Medicine certifying examination scores. J Gen Intern Med. 1992;7:63–67.
32. Waxman H, Braunstein G, Dantzker D, et al. Performance on the internal medicine second-year residency in-training examination predicts the outcome of the ABIM certifying examination. J Gen Intern Med. 1994;9:692–694.
33. Rollins LK, Martindale JR, Edmond M, Manser T, Scheld WM. Predicting pass rates on the American Board of Internal Medicine certifying examination. J Gen Intern Med. 1998;13:414–416.
34. Babbott SF, Beasley BW, Hinchey KT, Blotzer JW, Holmboe ES. The predictive validity of the internal medicine in-training examination. Am J Med. 2007;120:735–740.
35. Webb LC, Juul D, Reynolds CF 3rd, et al. How well does the psychiatry residency in-training examination predict performance on the American Board of Psychiatry and Neurology. Part I. Examination? Am J Psychiatry. 1996;153:831–832.
36. Goodman JC, Juul D, Westmoreland B, Burns R. RITE performance predicts outcome on the ABPN part I examination. Neurology. 2002;58:1144–1146.
37. Juul D, Schneidman BS, Sexson SB, et al. Relationship between resident-in-training examination in psychiatry and subsequent certification examination performances. Acad Psychiatry. 2009;33:404–406.
38. Baumgartner BR, Peterman SB. 1998 Joseph E. Whitley, MD, Award. Relationship between American College of Radiology in-training examination scores and American Board of Radiology written examination scores. Part 2. Multi-institutional study. Acad Radiol. 1998;5:374–379.
39. Baumgartner BR, Peterman SB. Relationship between American College of Radiology in-training examination scores and American Board of Radiology written examination scores. Acad Radiol. 1996;3:873–878.
40. Althouse LA, McGuinness GA. The in-training examination: An analysis of its predictive value on performance on the general pediatrics certification examination. J Pediatr. 2008;153:425–428.
41. Spellacy WN, Carlan SJ, McCarthy JM, Tsibris JC. Prediction of ABOG written examination performance from the third-year CREOG in-training examination results. J Reprod Med. 2006;51:621–622.
42. Withiam-Leitch M, Olawaiye A. Resident performance on the in-training and board examinations in obstetrics and gynecology: Implications for the ACGME Outcome Project. Teach Learn Med. 2008;20:136–142.
43. Kearney RA, Sullivan P, Skakun E. Performance on ABA-ASA in-training examination predicts success for RCPSC certification. American Board of Anesthesiology–American Society of Anesthesiologists. Royal College of Physicians and Surgeons of Canada. Can J Anaesth. 2000;47:914–918.
44. Klein GR, Austin MS, Randolph S, Sharkey PF, Hilibrand AS. Passing the boards: Can USMLE and orthopaedic in-training examination scores predict passage of the ABOS part-I examination? J Bone Joint Surg Am. 2004;86-A:1092–1095.
45. Rinder HM, Grimes MM, Wagner J, Bennett BD; RISE Committee, American Society for Clinical Pathology and the American Board of Pathology. Senior pathology resident in-service examination scores correlate with outcomes of the American Board of Pathology certifying examinations. Am J Clin Pathol. 2011;136:499–506.
46. Cho JJ, Kim JY. Predictive value of the Korean Academy of Family Medicine in-training examination for certifying examination. Korean J Fam Med. 2011;32:352–357.
47. Rasch G. Probabilistic Models for Some Intelligence and Attainment Tests. 1960.Copenhagen, Denmark: Danish Institute for Educational Research.
48. Norris TE, Rovinelli RJ, Puffer JC, Rinaldo J, Price DW. From specialty-based to practice-based: A new blueprint for the American Board of Family Medicine cognitive examination. J Am Board Fam Pract. 2005;18:546–554.
49. National Board of Osteopathic Medical Examiners. COMLEX-USA Bulletin of Information, 2015–2016. 2015. Chicago, Ill: National Board of Osteopathic Medical Examiners; http://www.nbome.org/comlexBOI.pdf
. Accesed March 31, 2016.
50. Spearman C. The proof and measurement of association between two things. Am J Psychol. 1904;15:72–101.
51. Zimmerman DW, Williams RH. Properties of the Spearman correction for attenuation for normal and realistic non-normal distributions. Appl Psychol Meas. 1997;21:253–270.
52. Schumaker RE, Muchinsky PM. Disattenuating correlation coefficients. Rasch Meas Trans. 1996;10:479.
53. Fisher RA. Frequency distribution of the values of the correlation coefficient in samples from an indefinitely large population. Biometrika. 1915;10:507–521.
54. Weinstein MC, Fineberg HV. Clinical Decision Analysis. 1980.Philadelphia, Pa: Saunders.
55. Cohen J. Statistical Power Analysis for the Behavioral Sciences. 1988.2nd ed. Hillsdale, NJ: Erlbaum.
56. Guilford JP. Fundamental Statistics in Psychology and Education. 1978.6th ed. New York, NY: McGraw Hill.
57. Linacre JM, Wright BD. The “length” of a logit. Rasch Meas Trans. 1989;3:54–55.
58. Hawkins RE, Sumption KF, Gaglione MM, Holmboe ES. The in-training examination in internal medicine: Resident perceptions and lack of correlation between resident scores and faculty predictions of resident performance. Am J Med. 1999;106:206–210.
59. Taylor C, Lipsky MS. A study of the ability of physician faculty members to predict resident performance. Fam Med. 1990;22:296–298.
60. Parker RW, Alford C, Passmore C. Can family medicine residents predict their performance on the in-training examination? Fam Med. 2004;36:705–709.