Introduction
The potential feedback offered by in-training examinations has propelled these assessments to become prominent components of medical specialty and subspecialty training programs. In-training examinations are typically used to evaluate and track trainees’ acquisition of clinical knowledge over the course of their training (1 ,2 ). In-training examinations can provide objective evidence regarding a trainee’s readiness to successfully complete the corresponding board certification examination and thus, allow implementation of remediation plans if needed. A growing number of studies have indicated that in-training examination performance has a positive relationship with performance on the respective board certification examination within a variety of medical specialties (3–10 ). Generally, average to high scores on these examinations yield a high probability of passing the respective board certification examination . In-training examination performance also correlates well with other important medical examinations, such as the US Medical Licensure Examination (USMLE) Step Examinations (11 ). Although the positive relationship between in-training examinations and other indicators of medical knowledge seems consistent, it has not been empirically established for the American Society of Nephrology (ASN) In-Training Examination .
Because the ASN In-Training Examination has been administered annually since 2009 (12 ), there are now sufficient data to investigate the relationship between the In-Training Examination and the American Board of Internal Medicine (ABIM) Nephrology Certification Examination. The ASN In-Training Examination is intended to provide specific information to allow nephrology fellowship program directors to identify topical areas for improvement in medical knowledge for individual fellows as well as aggregate feedback about potential strengths and weaknesses in their training programs’ curricula (13 ). In addition, the In-Training Examination is designed to mirror the content blueprint of the ABIM Nephrology Certification Examination, and thus, it can serve as a formative assessment to show a fellow’s readiness to take the board examination. This study addresses the gap in the literature regarding the validity of the ASN In-Training Examination by establishing the In-Training Examination ’s association with scores and passing status on the Nephrology Certification Examination.
Materials and Methods
Participants
We identified 1684 second-year nephrology fellows from 322 fellowship programs who completed the ASN In-Training Examination between 2009 and 2014 and had complete data along each variable investigated. Nephrology fellowship training typically lasts 2 years, with potential additional years dedicated to research training. The study participants included only second-year nephrology fellows, because the content covered by the ASN In-Training Examination was developed to reflect the appropriate knowledge base at this point in training. Figure 1 displays a more detailed breakdown regarding the study participants, including missing data counts. All USMLE and ABIM Certification Examination scores were from the individual’s first attempt. Within each examination, scores are placed on a standardized scale across administrations by the testing organizations via established Item Response Theory equating procedures (14 ). Equating is a statistical process that adjusts for potential differences in form difficulty across time, allowing scores from different years to be compared and analyzed collectively.
Figure 1.: Diagram of examinee selection procedures for this study detailing the various sources of missing data. Of the original pool of 4335 proper administrations between 2009–2014, 1,684 were selected for study. Standard testing conditions refers to fellows who tested under the standard allotted time with no aberrant occurrences in the testing center, such as power outages. ABIM, American Board of Internal Medicine ; ASN ITE, American Society of Nephrology In-Training Examination ; PGY, postgraduate year; USMLE, US Medical Licensure Examination.
Materials
We merged data collected by the ASN (In-Training Examination scores, sex, and medical school type), the ABIM (final program director ratings of clinical competence, scores and passing status on the Nephrology Certification Examination, and Internal Medicine Certification Examination scores), and the NBME (USMLE step 1, step 2 National Board of Medical Examines (USMLE Step 1, Step 2 clinical knowledge, and Step 3 examination scores) for the study participants.
ASN In-Training Examination .
The ASN In-Training Examination content outline is developed by the ASN to align closely with the ABIM Nephrology Certification Examination (Supplemental Appendix A shows the blueprint). A more thorough overview of the In-Training Examination development along with initial validity evidence are presented in the work by Rosner et al. (12 ). Beginning in 2009, administration of the examination occurred yearly through a secure web-based testing format. The examination contains approximately 170 multiple choice items, with reliability coefficients typically exceeding 0.80 across administrations. All analyses in the study were conducted on the equated score scale (mean =500; SD=100; range =200–800).
ABIM Nephrology Certification Examination.
This examination was designed to assess the expected knowledge, diagnostic reasoning, and clinical judgment skills expected of a certified nephrologist (Supplemental Appendix B shows the blueprint). This examination is computer based and consists of 240 multiple choice items, 200 of which are scored and equated. The reported scores use a standard equated scale with a mean of 500, an SD of 100, and a range from 200 to 800. Reliability coefficients exceeded 0.90 across administrations.
ABIM Internal Medicine Certification Examination.
This examination was developed to assess whether an individual has the knowledge, diagnostic reasoning, and clinical judgment skills expected of a certified internist. The examination is administered via a secure computer-based format and contains 240 total multiple-choice items, 200 of which are scored and equated. Overall scores are reported on a standard scale with a mean of 500, an SD of 100, and a range from 200 to 800. Reliability coefficients of this examination consistently exceeded 0.90 across administrations.
USMLE Step Examinations.
The USMLE assesses a physician’s ability to apply knowledge, concepts, and principles as well as display patient-centered skills that are important in health and disease and serve as the basis for safe and effective patient care. This study included three components of the USMLE: step 1, step 2 clinical knowledge, and step 3. Step 1 primarily focuses on science concepts basic to the practice of medicine, Step 2 content focuses on clinical science principles for supervised practice, and step 3 assesses medical knowledge and clinical skills for unsupervised practice. These three step examinations are administered through a secure computer-based system and contain approximately 300 multiple choice questions. The reliability for each step examination consistently exceeds 0.87. Total test scores on the step examinations are standardized to have a mean of 200 and an SD of 20 on the basis of the respective reference group for each examination, and they range from zero to 300.
Nephrology Program Director Ratings.
Nephrology program directors completed a holistic, standardized performance evaluation of their individual fellows at the end of each training year on six core competencies as well as an overall rating of clinical competence. Scores are assigned on a nine-point scale representing three performance categories: unsatisfactory (one to three), satisfactory (four to six), and superior (seven to nine). Statistical analyses conducted in this study included the program director’s overall ratings of clinical competence in nephrology , which were provided before the realignment with the Accreditation Council for Graduate Medical Education Next Accreditation System Milestones competency-based approach in 2015 (15 ).
Statistical Analyses
We first conducted a multiple linear regression analysis to evaluate the utility of the ASN In-Training Examination scores in explaining variance in subsequent ABIM Nephrology Certification Examination performance. The independent variables in the model included the ASN In-Training Examination ; USMLE step 1, step 2 clinical knowledge, and step 3; ABIM Internal Medicine Certification ; and program director’s overall clinical competence ratings along with examinee sex and whether the examinee was a graduate of a medical school in the United States/Canada or elsewhere. Nephrology Certification Examination scores served as the dependent variable. The Pratt index (16 ,17 ), which indicates the proportion of explained variance in the dependent variable accounted for by each independent variable, was used to evaluate the relative contribution of each variable. This index was used alongside regression coefficients, information criteria measures (18 ), and nested model tests to identify the variables that significantly contributed to explaining Nephrology Certification Examination score variation. The overall utility of the final set of predictors was assessed via the coefficient of determination (R 2 ).
Next, logistic regression analysis was conducted to assess the association of the independent variables with a passing outcome on the Nephrology Certification Examination (dependent variable). We again used model selection indices (regression coefficients, information criteria measures, and nested model tests) to identify the variables that significantly contributed to explaining Nephrology Certification Examination passing status. The overall utility of the final set of predictors was assessed via classification accuracy and Nagelkerke R 2 . To aid in interpreting results, the continuous medical assessment variables (ASN In-Training Examination , ABIM Internal Medicine, and step scores) were evaluated in units of a 1-SD increase in the examination scores for both the multiple and logistic regression models.
Last, a separate logistic regression model was conducted to examine the relationship of ASN In-Training Examination scores and Nephrology Certification Examination passing status, independent of the other medical knowledge and demographic variables. Standard repeated K -fold crossvalidation (19 ) with tenfold and 200 repetitions was used to split the sample into ten nonoverlapping groups (N per group was approximately 168). The Hosmer–Lemeshow test on the basis of 20 equally sized groups was used to evaluate model calibration (20 ). Classification accuracy was evaluated using the area under the respondent operator curve along with specificity (proportion of passing examinees correctly predicted to pass by the In-Training Examination threshold), sensitivity (proportion of failing examinees correctly predicted to fail), positive predictive value (proportion of examinees predicted to pass who did pass), negative predictive value (proportion of examinees predicted to fail who did fail), and overall accuracy (21 ). These statistics were computed at various ASN In-Training Examination score points to facilitate interpretation of results and selection of a threshold to identify potentially at-risk fellows. Analyses were conducted via R version 3.4.1 (22 ).
Results
Table 1 presents demographic and examination performance statistics of the variables analyzed for the participants overall and by various In-Training Examination performance thresholds. This cohort was primarily men (62%) and international medical school graduates (62%), with an average age of 32 years old at the first attempt on the Nephrology Certification Examination. An overwhelming majority (89%) passed the Nephrology Certification Examination on their first attempt.
Table 1. -
Characteristics of 1684 second-year
nephrology fellows who took the American Society of
Nephrology In-Training Examination and the
American Board of Internal Medicine Nephrology Certification Examination
Variables
Overall
ITE<370
ITE 370–469
ITE 470–570
ITE 571–670
ITE>670
Sex, N (%)
 Women
632 (38)
41 (42)
201 (45)
242 (37)
114 (32)
34 (28)
 Men
1052 (62)
57 (58)
243 (55)
419 (63)
245 (71)
88 (72)
Medical school, N (%)
 LCME accredited United States/Canada
638 (38)
42 (43)
219 (49)
232 (35)
110 (31)
35 (29)
 International
1046 (62)
56 (57)
225 (51)
429 (65)
249 (69)
87 (71)
ABIM Nephrology Certification Examination, mean (SD)
519 (72)
429 (65)
476 (54)
520 (55)
563 (57)
616 (55)
ASN ITE, mean (SD)
519 (99)
333 (33)
429 (26)
520 (28)
612 (26)
718 (40)
ABIM Internal Medicine Certification Examination, mean (SD)
543 (82)
467 (67)
498 (61)
541 (53)
570 (49)
607 (49)
Director ratings of overall competence, mean (SD)
7.6 (1.0)
7.1 (1.1)
7.3 (1.1)
7.6 (0.9)
7.9 (0.8)
8.1 (0.7)
USMLE step 1, mean (SD)
223 (19)
208 (16)
215 (17)
224 (17)
231 (17)
240 (17)
USMLE step 2 clinical knowledge, mean (SD)
226 (20)
208 (18)
217 (18)
226 (18)
236 (18)
244 (19)
USMLE step 3, mean (SD)
212 (15)
202 (11)
208 (13)
212 (15)
217 (15)
224 (18)
ITE thresholds represent the following SDs from mean ITE performance: ITE<370, <−1.5 SD; ITE 370–469, ≥−1.5 SD and <−0.5 SD; ITE 470–570, ≥−0.5 SD and ≤0.5 SD; ITE 571–670, >0.5 SD and ≤1.5 SD; and ITE>670, >1.5 SD. ITE, In-Training Examination ; LCME, Liaison Committee on Medical Education; ABIM, American Board of Internal Medicine ; ASN, American Society of Nephrology ; USMLE, US Medical Licensing Examination.
Strength of Association with Nephrology Certification Examination Performance
Model selection indices for the multiple regression, presented in Supplemental Appendix C , indicated that step 2 clinical knowledge scores and sex could be excluded without hindering overall model fit or significantly decreasing the variance explained. Thus, these variables were excluded from the final regression model. Table 2 presents results from the multiple linear regression. This model accounted for 55% of the variance in ABIM Nephrology Certification Examination scores [adjusted R 2 =0.55; F (6, 1677)=347.97; P <0.001]. ASN In-Training Examination scores displayed the strongest association, accounting for 50% of the explained variance in ABIM Nephrology Certification Examination scores. The regression coefficient indicates that the model estimates about a 30-point increase (95% confidence interval, 27 to 33) in Certification Examination scores for a 1-SD increase in In-Training Examination scores (98.5 points). The second strongest variable in the model was ABIM Internal Medicine Certification Examination scores, which only accounted for about one half of the explained variance compared with the In-Training Examination (29%). Step 3 scores, program directors’ clinical competence ratings, and medical school type had the weakest associations with Nephrology Certification Examination scores.
Table 2. -
Associations of American Society of
Nephrology In-Training Examination scores and other evaluations with
American Board of Internal Medicine Nephrology Certification Examination scores among 1684 second-year
nephrology fellows
Independent Variables
Difference in Nephrology Certification Examination Score (95% CI)
P Value
Relative Contribution, %
ASN ITE per 98.5 U
30 (27 to 33)
<0.001
50
ABIM Internal Medicine Certification Examination per 82.2 U
19 (16 to 22)
<0.001
29
Director ratings per 1.0 U
5 (2 to 7)
<0.001
4
USMLE step 1 per 19.0 U
10 (8 to 13)
<0.001
13
USMLE step 3 per 15.3 U
3 (0 to 6)
0.03
3
Medical school (1= LCME accredited United States/Canada; 0= international)
18 (13 to 24)
<0.001
1
N =1684. Adjusted R 2 =0.55. The relative contribution indicating the proportion of unique variance attributable to each predictor was estimated via the Pratt index. Regression coefficients for the continuous variables reflect units of 1 SD as noted in column 2. Regression coefficients for the categorical variable, medical school, represent the difference between the two groups. 95% CI, 95% confidence interval; ASN, American Society of Nephrology ; ITE, In-Training Examination ; ABIM, American Board of Internal Medicine ; USMLE, US Medical Licensing Examination; LCME, Liaison Committee on Medical Education.
Strength of Association with Nephrology Pass-Fail Outcomes
Model selection indices for the logistic regression (Supplemental Appendix C ) suggested that a model excluding step 1 scores, step 2 scores, and sex was the most parsimonious model that did not significantly decrease overall utility. Table 3 contains the results of the logistic regression. The ASN In-Training Examination score again showed a significant and powerful association with passing status on the ABIM Nephrology Certification Examination with an odds ratio of 3.46 (95% confidence interval, 2.68 to 4.54; P <0.001) for each SD increase. The final set of predictors resulted in a Nagelkerke R 2 of 0.35 and accurately classified 90.2% of fellows. Overall classification accuracy should be interpreted cautiously in this situation, because the base rate of failing the Nephrology Certification Examination was low for this sample (11%).
Table 3. -
Associations of American Society of
Nephrology In-Training Examination scores and other evaluations with a passing score on the
American Board of Internal Medicine Nephrology Certification Examination among 1684 second-year
nephrology fellows
Independent Variables
Odds Ratio (95% CI)
P Value
ASN ITE per 98.5 U
3.46 (2.68 to 4.54)
<0.001
ABIM Internal Medicine Certification Examination per 82.2 U
1.63 (1.32 to 1.74)
<0.001
Director ratings per 1.0 U
1.25 (1.05 to 1.48)
0.01
USMLE step 3 per 15.3 U
1.37 (1.08 to 1.74)
0.01
Medical school (1= LCME accredited United States/Canada; 0= international)
1.99 (1.32 to 3.03)
0.001
N =1684. Nagelkerke R 2 =0.35. Regression coefficients and odds ratios for continuous variables reflect units of 1 SD as noted in column 1. Regression coefficients and odds ratios for the categorical variable, medical school, represent the difference between the two groups. 95% CI, 95% confidence interval; ASN, American Society of Nephrology ; ITE, In-Training Examination ; ABIM, American Board of Internal Medicine ; USMLE, US Medical Licensing Examination; LCME, Liaison Committee on Medical Education.
Predictive Utility of the ASN In-Training Examination
Supplemental Appendix D presents the calibration curve and Hosmer–Lemeshow test results of a model containing only the ASN In-Training Examination . The curve showed relatively close fit between the observed and predicted data, and the test indicated that there was no significant difference between observed and predicted values for our sample [chi square (18) =24.31; P =0.15]. Across all K -fold validation samples, 87% of the samples did not show significant misfit at P <0.05 according to the Hosmer–Lemeshow test.
Figure 2 depicts the probability of passing the ABIM Nephrology Certification Examination given a particular ASN In-Training Examination score on the basis of the average coefficients across the in-sample examinees from the repeated tenfold crossvalidation procedure. Figure 2 shows that fellows who scored at the mean (519) have approximately a 95% chance of passing the Nephrology Certification Examination. In contrast, fellows scoring 2 SDs below the mean (321), the lowest 2% of the population, would be predicted to only have a 44% chance of passing the examination. Using only the In-Training Examination scores yielded an average classification accuracy of 90% (95% confidence interval, 88% to 92%) when using a predicted probability of 0.50 (In-Training Examination score of 337) to classify pass-fail status on the out of sample examinees. This value provides limited information given that predicting all fellows to pass would produce an 89% classification rate.
Figure 2.: The predicted probability of passing the American Board of Internal Medicine Nephrology Certification Examination increases sharply at lower American Society of Nephrology In-Training Examination scores before asymptoting near 1 around a score of 550. The shaded gray area represents the 95% confidence interval for predicted values. PGY2, postgraduate year 2.
Figure 3 presents the receiver operating characteristic curve for this model. The initial increase in sensitivity as specificity tolerance decreases is rather steep, for which we would be hope. However, after sensitivity reaches approximately 0.70, it takes a considerable decrease in specificity for any further meaningful increase in sensitivity. The area under the curve was 83%. Table 4 shows the classification indices at several ASN In-Training Examination score thresholds. In general, the In-Training Examination yielded high sensitivity and positive predictive value but low specificity.
Figure 3.: The receiver operating characteristic curve indicates that sensitivity initially increases quickly, but only slowly increases as specificity is reduced. AUC, area under the curve.
Table 4. -
Classification statistics for passing the
American Board of Internal Medicine Nephrology Certification Examination at various American Society of
Nephrology In-Training Examination thresholds
ITE Score
Sensitivity
Specificity
Positive Predictive Value
Negative Predictive Value
Overall Classification Accuracy
300
1.00
0.05
0.90
0.62
0.90
337
0.99
0.14
0.91
0.61
0.90
350
0.98
0.17
0.91
0.54
0.90
375
0.97
0.30
0.92
0.50
0.89
400
0.93
0.40
0.93
0.39
0.87
425
0.87
0.54
0.94
0.33
0.84
450
0.80
0.69
0.96
0.29
0.79
ITE, In-Training Examination .
Discussion
This study presents the first investigation into the association between ASN In-Training Examination scores and subsequent performance on the ABIM Nephrology Certification Examination using national cohorts of fellows across years. The In-Training Examination score was found to have the strongest association with Nephology Certification Examination score, explaining the most variation in performance relative to other assessments of medical knowledge, overall clinical competence ratings by program directors, and potentially important demographic characteristics.
The logistic regression results showed that In-Training Examination scores are also associated with passing the Nephrology Certification Examination and potentially aid program directors in identifying at-risk fellows. We recommend interpreting future prediction with caution, however, because it can be difficult to accurately identify fellows who will fail the examination given the low base rate of failure during the study period (11%). Table 4 shows that each In-Training Examination threshold yields overall classification accuracy similar to predicting all examinees as passing (0.89). Although using the In-Training Examination as a predictor can help identify truly at-risk examinees, there is a tradeoff in that program directors may incorrectly identify some fellows as at risk for failure.
Those scoring extremely low on the In-Training Examination are more likely to fail the Certification Examination, which was shown by the moderate to high negative predictive value for lower thresholds. However, the low specificity also shows that many at-risk fellows would be missed when using a low threshold. For example, only 5% of examinees who failed were predicted to fail under a threshold of 300. Thus, program directors might consider using a higher threshold for identifying at-risk fellows. Although this approach would detect more fellows likely to fail (specificity increases to 0.30 and 0.54 at thresholds of 375 and 425 respectively), additional fellows would be incorrectly identified as at risk (negative predictive value decreases from 0.62 to 0.50 and 0.33 at thresholds of 300, 375, and 425, respectively). Program directors must recognize the implications of the attendant false positives and false negatives to mitigate possible stigmatizing consequences when deciding to select an In-Training Examination score threshold to identify at-risk fellows given these issues.
Program directors and fellows should also avoid overinterpretation of the strength of association between ASN In-Training Examination and ABIM Certification Examination scores. Fellows who score low on the In-Training Examination are not guaranteed to perform poorly on the Certification Examination. Scores should be used in conjunction with other valid indicators of nephrology knowledge to identify fellows potentially at risk. Nevertheless, because the In-Training Examination is a formative assessment, broadly identifying fellows at risk of Certification Examination failure and designing appropriate remediation plans to address low medical knowledge are likely to not only help the fellows themselves but also, benefit their future patients. More generally, aggregate data will allow program directors to plan and implement curricular changes within their individual fellowship programs to improve readiness of fellows before taking the Nephrology Certification Examination.
There are other important limitations that should be considered when interpreting results from this study. There are likely numerous factors potentially influencing performance on the ABIM Certification Examination that could not be controlled for in this study as evidenced by the final multiple linear regression model predicting only 55% of the variation in ABIM Nephrology Certification Examination scores. The other one half of the variance could be explained by unmeasured factors, including how the fellows prepared for the Certification Examination as well as levels of fatigue and personal stressors on the day of the examination. In addition, the low base rate of failing the Nephrology Certification Examination makes it challenging to estimate accurate coefficients and produce highly specific models. The low failing percentage may be exacerbated if fellows who score extremely low on the In-Training Examination do not attempt the Nephrology Certification Examination. Of the 103 fellows who had data for the In-Training Examination but not the Nephrology Certification Examination, the mean In-Training Examination score was only 447.23 (SD=98.90) compared with the mean of 519 for those who had complete data. This difference of 71.78 was statistically significant [t (1785)=7.18; 95% confidence interval, 52.12 to 91.42; P <0.001), potentially suggesting that some of the fellows who could be considered at risk for failing the ABIM Nephrology Certification Examination may not be completing their training programs or taking the Certification Examination.
In conclusion, this study showed the association between ASN In-Training Examination scores and ABIM Nephrology Certification Examination performance and described how In-Training Examination scores could be used diagnostically to aid program directors in assessing a fellow’s readiness for the Certification Examination. The results of this study align with the growing literature indicating the utility of various medical in-training examinations in predicting board certification performance. Moreover, evidence supporting the validity of in-training examinations scores helps address physicians’ and policymakers’ ongoing call for proof that medical knowledge assessments are valuable to medical education.
Disclosures
None.
Acknowledgments
We would like to acknowledge the volunteer members of the American Society of Nephrology In-Training Examination (ITE) Test Materials Development Committee during the period of this study, whose significant investments of time, effort, and expertise have been invaluable in developing and maintaining the ITE: Shubha Ahya, James Bailey, Jeffrey Berns, Susan DiGiovanni, Jamie Dwyer, Pamela Fall, Simin Goral, James Johnston, Kenneth Kokko, Eleanor Lederer, Edgar Lerma, Madhukar Misra, Jack Moore Jr., N. Stanley Nahman Jr., Mark Parker, Qi Qian, Panduranga Rao, Mitchell Rosner, James Simon, Leslie Thomas, Ashita Tolwani, Juan Carlos Velez, and Jane Yeun.
References
1. Leigh TM, Johnson TP, Pisacano NJ: Predictive validity of the American Board of family practice
in-training examination . Acad Med 65: 454–457, 19902242200
2. Grossman RS, Fincher RM, Layne RD, Seelig CB, Berkowitz LR, Levine MA: Validity of the
in-training examination for predicting
American Board of Internal Medicine certifying examination scores. J Gen Intern Med 7: 63–67, 19921548550
3. Kempainen RR, Hess BJ, Addrizzo-Harris DJ, Schaad DC, Scott CS, Carlin BW, Shaw RC Jr., Duhigg L, Lipner RS: Pulmonary and critical care in-Service training examination score as a predictor of
board certification examination performance. Ann Am Thorac Soc 13: 481–488, 201626863101
4. Lohr KM, Clauser A, Hess BJ, Gelber AC, Valeriano-Marcet J, Lipner RS, Haist SA, Hawley JL, Zirkle S, Bolster MB; American College of Rheumatology Committee on Rheumatology Training and Workforce Issues: Performance on the adult rheumatology
in-training examination and relationship to outcomes on the rheumatology
certification examination. Arthritis Rheumatol 67: 3082–3090, 201526215276
5. Grabovsky I, Hess BJ, Haist SA, Lipner RS, Hawley JL, Woodward S, Engleberg NC: The relationship between performance on the infectious diseases in-training and
certification examinations. Clin Infect Dis 60: 677–683, 201525409475
6. Juul D, Flynn FG, Gutmann L, Pascuzzi RM, Webb L, Massey JM, Dekosky ST, Foertsch M, Faulkner LR: Association between performance on neurology in-training and
certification examinations. Neurology 80: 206–209, 201323296130
7. de Virgilio C, Yaghoubian A, Kaji A, Collins JC, Deveney K, Dolich M, Easter D, Hines OJ, Katz S, Liu T, Mahmoud A, Melcher ML, Parks S, Reeves M, Salim A, Scherer L, Takanishi D, Waxman K: Predicting performance on the American Board of surgery qualifying and certifying examinations: A multi-institutional study. Arch Surg 145: 852–856, 201020855755
8. Juul D, Schneidman BS, Sexson SB, Fernandez F, Beresin EV, Ebert MH, Winstead DK, Faulkner LR: Relationship between resident-
in-training examination in psychiatry and subsequent
certification examination performances. Acad Psychiatry 33: 404–406, 200919828858
9. Withiam-Leitch M, Olawaiye A: Resident performance on the in-training and board examinations in obstetrics and gynecology: Implications for the ACGME Outcome Project. Teach Learn Med 20: 136–142, 200818444200
10. Babbott SF, Beasley BW, Hinchey KT, Blotzer JW, Holmboe ES: The predictive validity of the internal medicine
in-training examination . Am J Med 120: 735–740, 200717679136
11. McDonald FS, Zeger SL, Kolars JC: Associations between United States Medical Licensing Examination (USMLE) and Internal Medicine
In-Training Examination (IM-ITE) scores. J Gen Intern Med 23: 1016–1019, 200818612735
12. Rosner MH, Berns JS, Parker M, Tolwani A, Bailey J, DiGiovanni S, Lederer E, Norby S, Plumb TJ, Qian Q, Yeun J, Hawley JL, Owens S; ASN
In-Training Examination Committee: Development, implementation, and results of the ASN
in-training examination for fellows. Clin J Am Soc Nephrol 5: 328–334, 201019965525
13. American Society of
Nephrology :
In-Training Examination for
Nephrology Fellows: FAQ. 2015. Available at:
https://www.asn-online.org/education/training/ite/faq.aspx . Accessed March 31, 2015
14. Cook LL, Eignor DR: An NCME instructional module on IRT equating methods. Educ Meas: Issues Pract 10: 37–45, 1991
15. Nasca TJ, Philibert I, Brigham T, Flynn TC: The next GME accreditation system--rationale and benefits. N Engl J Med 366: 1051–1056, 201222356262
16. Pratt JW: Dividing the Indivisible: Using Simple Symmetry to Partition Variance Explained, edited by Pukkila T, Puntanen S, Tampere, Finland, University of Tampere, 1987, pp 245–260
17. Thomas DR, Hughes E, Zumbo BD: On variable importance in linear regression. Soc Indic Res 45: 253–275, 1998
18. Burnham KP, Anderson DR: Multimodel inference: Understanding AIC and BIC in model selection. Sociol Methods Res 33: 261–304, 2004
19. Kim JH: Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap. Comput Stat Data Anal 53: 3735–3745, 2009
20. Hosmer DW, Lemeshow S: A goodness-of-fit test for the multiple logistic regression model. Commun Stat A10: 1043–1069, 1980
21. Zhou XH, Obuchowski NA, McClish DK: Statistical Methods in Diagnostic Medicine, 1st Ed., New York, John Wiley & Sons, 2002
22. R Core Team: R: A Language and Environment for Statistical Computing, Vienna, Austria, R Foundation for Statistical Computing, 2017