We summarize our linear regressions in Table 4. Slopes among courses with significant associates to the NPTE performance varied from 5 NPTE points per course grade point to 10 NPTE points per grade point, meaning that an increase in course grade by 1 percentage point predicts an increase in NPTE score by between 5 and 11 points. We note that negative intercepts are not unexpected, particularly for courses with steep slopes.
National Physical Therapy Exam score prediction was much stronger for curricular GPAs: P < .001 in all 6 graduate GPA measures, with 0.45 ≤ R 2 ≤ 0.68. Grade point average upon entry was not a significant predictor of NPTE score and yielded P values not less than .05. We note that a significant P-value is not particularly informative about a course's predictive utility; R 2 is the much more robust measure.12
National Physical Therapy Exam Outcome Prediction
For each of 12 courses, 26 different grade thresholds were measured for predictive value in identifying at-risk students for first-time failure on the NPTE. One example is shown in Table 5.
We observe that for many grade thresholds, including that associated with the lowest P-value, the confidence interval contains infinity. This is an expected outcome for contingency tables containing cells with no elements as division by zero is a mathematically undefined operation, which in many computing environments is short-handed as infinity. In particular, it is ideal to have to have zero students who are represented in the off-diagonal elements. Off-diagonal elements are defined as the students who are above the course grade threshold and fail the exam (SupraFail) or students who fall below the course grade threshold and pass the exam (SubPass); refer to sample contingency tables (above) for representative examples.
Across the 12 courses, we find that the optimum grade threshold varies from 79 to 93 points, and that the predictive sensitivity of detecting NPTE failures ranges from low (2 of 8) to high (all 8). For 2 courses (DPT500 and DPT504), the P-value was significant at P < .05, but the RR is reported as “infinity.” This is an expected outcome that would result from either of the following quantities yielding zero: either SubFail + SubPass (ie, there were no students below the threshold course grade) or SupraFail (ie, there were no students passing the course and subsequently failing the exam). In both cases here, the latter explains the undefined RR (Table 6).
P-values ranged from P = 3.4 × 10−5 to P = .22, with calculable RR ranging from 0.4 to 16. Among the 7 GPAs tested for utility in predicting NPTE outcome, all 5 curriculum GPAs yielded significant P-values and with RR ranging from 4.7 (GPA-II) to 9.3 (GPA-I). Neither Program-entry GPA yielded P < .05. We note that the optimal GPA cutoffs varied widely, whereas optimal thresholds varied from 2.6 (GPA-III) to 3.4 (GPA-II). However, this agrees with our findings related to the threshold course grades presented in Table 6.
When replicating the linear regression across 5 variables for which there were multiple cohorts' data, we found moderate variability, but could not identify systematic trends. In Table 7, we report raw R 2 values from these regressions.
These data are shown in Figure 2 for ease of visualization.
Although some variables (science GPA and final GPA) varied within narrow ranges, others (course grades in DPT608 and DPT609 and overall GPA upon admission) were widely scattered. Uniformly, the science GPA was a poor predictor of NPTE performance across all 5 cohorts.
Separately, we found very high agreement between goodness of fit as extracted from linear models fitted to either raw numeric grade data or on course performances converted to letter grades; these data are summarized in Table 8.
These data correlated with ρ > .9, indicating that the approach described here works equally well for either grading system (Figure 3).
DISCUSSION AND CONCLUSION
The primary aim of this study was to test whether performance in any single course or any single milestone GPA showed utility in predicting performance on the NPTE, either in first-time score or in first-time pass/fail outcome. Our primary findings were that many predictor variables showed moderately good prediction of NPTE score for many courses (Table 4). Our findings in the prediction of pass/fail were more varied. Some predictors showed negligible association to exam outcome while others showed very promising association (Table 6). In the case of outcome prediction, the optimal grade threshold varied substantially by course. Our findings are fairly robust across multiple years: predictors that are strong, weak, or moderate tend to be consistently strong, weak, or moderate, with some fluctuation (Table 7). Finally, we find that these approaches appear to work well for letter grades; therefore, access to raw numerical grade is not strictly necessary.
There is a challenge in comparing the performance of our approach versus those of others, given that there are only a handful of studies assessing the relationship between NPTE performance and academic performance. Methods vary, and there are challenges in assessing comparability of samples across studies. In a previous study of linear correlations between academic predictors and NPTE score, a substantial relationship has been reported: 47% of explained variance in NPTE score attributable to GPA and Clinical Performance Instruments.2 Elsewhere, only modest associations between NPTE score and core course GPA (R 2 = 0.42) and first-year GPA (R 2 = 0.12) were found, and the usefulness of GPA as a predictor appeared to decrease after year 1.6 Here, we found explained variance of 37.4 ± 17.6% (range: R 2 = 0.00 – 0.60) in single courses and 60.3 ± 8.8% (range: 0.45–0.68) in various GPA benchmarks. However, we found that preadmission variables were very weak predictors of NPTE success. This mirrors the findings of others,7 although some studies have found useful predictors through Graduate Record Examination scores and behavioral interviews.13
We observe that another study has shown that weakly significant findings generated in a linear regression disappear when replicated via logistic regression.10 Similarly, we found that some courses that were weakly significant in linear regression lost statistical significance in the Fisher's exact test: DPT501, DPT504, DPT505, and DPT511 (compare Table 4 vs Table 6). This may reflect a loss of statistical power when converting a continuous variable (NPTE score) to a dichotomous variable (NPTE outcome).14 The decision of whether to analyze these variables as continuous or dichotomous variables is nuanced and may impact the outcome of an analysis, its interpretation, and/or its ease of implementation.15 Extended discussion of the relative merits of these approaches is beyond the scope of this report. However, we appreciate both the ease of interpretation afforded by logistic regression (pass vs fail is the “bottom-line”) and the broader utility of linear regression, which provides guidance even in years when historical data contain only a few NPTE failures.
Although we had access to multiple graduation cohorts of data, there were several reasons for intentionally selected a single year's dataset. Our dataset was most complete for the 2015 academic year; all other datasets had substantial missing data due to either incomplete grading or large proportion of exam passes (2016 cohort: 97% first-time pass rate, ie, one student not passing on first attempt) or inconsistent in accessible record keeping (years before 2015). Moreover, analysis of a single cohort avoids “batch effects” due to evolutions in course content or instructor, variability in admissions, etcetera. Although we believe multi-cohort datasets could add substantial statistical power to this analysis, the need to control for these confounding variables would make the analysis more difficult to implement and interpret. For this demonstration of approach, a single-cohort analysis is ideal. Our data integrity is especially high: grade information was collected directly from the course instructors; NPTE data were collected directly from the Federation of State Boards of Physical Therapy; demographic variables are reviewed for accuracy institutionally. Our primary outcome of first performance on NPTE is ideal because the ultimate pass rate in NPTE is high, with few failures, which compromises statistical power. Finally, we note that while adding a logistic regression to this study would facilitate connection to previous studies, we felt that it added only incrementally to linear regression and that the systematic Fisher exact test offered much greater perspective.4
Our study is limited by the lack of a recommended framework for programmatic review. Cook et al4 recently reported on a possible association between undergraduate GPA and first-time pass rate; however, previous studies have shown much less robust prediction (R 2 < 0.4) between graduate GPA and NPTE outcomes than we did.6 In non-PT domains, for example, Nursing, Medicine, and Psychology, we find the predictive power of undergraduate GPA is mixed,16–19 although comparing results between professions is complicated because of the varying intervals between degree matriculation and licensure exam.
Regarding the present study, we acknowledge the limitations of sample size. We anticipate that this will be a common barrier for most programs wishing to implement this analysis and that the nature of this limitation will vary by program; for example, some programs will have a limited number of retrospective cohorts for analysis, some programs take in a larger class size per year; programs with a particularly high first-time pass rate will have difficulty obtaining reasonable heuristics of NPTE performance due to lack of sample contrast. Furthermore, our study considers only those students successfully completing the program. Students who depart the program without completing their doctoral degree were censored from this dataset due to incomplete course completion information and lack of NPTE results. Similarly, we suspect a reporting bias among those students who elect not to share their NPTE scores with their alma mater. Thus, we recognize the inherent limitations to prospective monitoring for at-risk students caused by these missing data.
We note further that while the primary outcome of our study is first-time pass rate of NPTE, and our main dataset (graduating class of 2015) showed excellent prediction in this setting, this is partly a reflection of a cohort where there was adequate sample of students requiring additional attempts at the NPTE. In contrast, among the graduating class of 2016, all students (minus one) passed the exam on their first attempt. This is an inherent confounder of this type of analysis. For cohorts with high pass rates, where contingency tables would have low cell values, a more appropriate selection of outcome measure would be the raw regression of predictor against the NPTE score. In our exploratory analysis of replication of model fit, we find fair agreement between the 2015 and 2016 cohorts among most predictors, suggesting that the approach of regression is useful year-to-year.
Finally, we observe that the modeling performed here is univariate, meaning that a single variable (NPTE score or outcome) was regressed against a single predictor (course grade or GPA). While there is certain explanatory value in adding additional factors (demographics, other descriptors related to educational history, and past academic performance), it is simply not feasible to further stratify an analysis containing only 26 students. We speculate that adding multiple terms to a given regression may add robustness to our single-term models (for instance, regression of NPTE score against performance in 2 classes), a multivariate regression study requires additional analytical techniques (collinearity analysis, feature selection, and possibly data transformation) that would far exceed the scope of this article and may decrease the accessibility of this method to a broad audience. Nevertheless, we believe there is excellent opportunity to expand this work.
Our analysis of a partial curriculum grade set has revealed several critical findings. First, we have determined that both individual foundational courses (range: 0.00 < R 2 < 0.60, Table 4) and curricular grade points (0.4 < R 2 < 0.7) have substantial ability to predict NPTE scores through simple linear regression without additional covariates. We note that the 2 GPAs calculated at the time of program entry show modest predictive utility: R 2 < 0.4. We further show that at-risk students can be identified by benchmarking performance within distributions of single course final grades, semester-wise GPAs, and cumulative GPAs (Table 6). Perhaps, the most profound outcome of this study is that at-risk students can be identified with high likelihood as early as the first semester. In our case demonstrated by completion of DPT508 with a grade below 83 points, which was associated with first-time NPTE failure with RR = 10.9 (P < .001; Table 6) and a GPA <3.0 at the end of the first summer semester yielded an outcome prediction RR = 9.3 (P < .01).
It is important to note that while our results are promising (many R 2 > 0.5 and significant RR scores), there is yet substantial variance in the NPTE performance that remains unexplained. While an extended discussion on this matter is beyond the scope of this article, we suggest that one important variable not accounted for here may be time to exam completion. Obviously, there is a tradeoff between recency of academic training versus depth of independent professional practice. We suggest that there may be substantial merit to further study the association between NPTE performance and timing of exam attempt relative to graduation. We encourage others who wish to replicate this study to include other factors in exploration.
This work was inspired by ambition to identify students at risk for failing the NPTE, with an interest in providing early intervention. Rapid and accurate identification of at-risk students is critically important to DPT Programs. Early intervention provides the best opportunity for successful remediation or advising of student who should consider withdrawal from the program given the high cost of tuition and potential long-term consequences of student loans.16 We feel it is prudent to consider every early performance marker as a candidate for prediction. Thus, while not a curriculum item, per se, we considered GPAs at the time of program entry to be worthy of inclusion in this study. Given their poor yield (Tables 4 and 6), we can now assert—at least for this dataset—that preadmission GPA is not a useful predictor of NPTE performance and that screening efforts are better directed toward the courses and graduate GPA thresholds.
Also noteworthy is the volatility of thresholds across the Program: cutoffs within the Program curricula range between 79% and 93%, and GPA thresholds range from 2.6 to 3.4, without any indication of systematic trend. This variability makes programmatic policy making a complicated enterprise. For example, this study provides evidence that simplistic performance criteria, for example, “Must pass each class with a minimum grade of 80 points” or “Must maintain a cumulative GPA of 3.0” may be suboptimal and impractical for their intended purpose of filtering truly at-risk students. It is not uncommon for 2 courses to have substantially different class averages or for a single course to vary year-to-year. One way to accommodate this variability is to consider thresholds that are specific to each course or based on a percentile. For instance, in DPT500, the optimal cutoff for predictive utility was found to be 84 points (Table 5). This corresponds to 14 students (supra-threshold) versus 12 students (sub-threshold); all 14 students passed the NPTE, whereas 9 of the 12 failed. It may be that instead of adopting a fixed-grade threshold at 84 points, it is preferable to set that risk marker at bottom 46% of the class (corresponding to 12 out of 26). Clearly, policy making is optimized when using multiple datasets.
In this study, we present a simple, reproducible analysis of student academic records as predictors of NPTE performance. Because we demonstrate these analyses primarily in the setting of a single cohort from a single DPT program, specific conclusions about which classes or GPAs (or which thresholds therein) are most predictive is not supported. This would require intensive review of a complete dataset comprising many cohorts, with the caveat that results will likely vary year-to-year. However, we urge that other programs consider incorporating a similar data-driven approach to identifying specific classes or GPA criteria with predictive utility for their own program. Furthermore, there is considerable opportunity for others to test the year-to-year variability in these benchmarks and the impact of missing data. We encourage others to replicate our approach and to publicly disseminate their findings.
1. Jewell DV, Riddle DL. A method for predicting a student's risk
for academic probation in a professional program in allied health. J Allied Health. 2005;34:17–23.
2. Kosmahl E. Factors related to physical therapist license examination scores. J Phys Ther Educ. 2005;19:52–56.
3. Mohr T, Ingram D, Hayes S, Du Z. Educational program characteristics and pass rates on the National physical therapy examination. J Phys Ther Educ. 2005;19:60–66.
4. Cook C, Engelhard C, Landry MD, McCallum C. Modifiable variables in physical therapy education programs associated with first-time and three-year National Physical Therapy Examination pass rates in the United States. J Educ Eval Health Prof. 2015;12:44.
5. Meiners KM, Rush D. Clinical performance and admission variables as predictors of passage of the National physical therapy examination. J Allied Health. 2017;46:164–170.
6. Dockter M. An analysis of physical therapy preadmission factors on academic success and success on the national licensing examination. J Phys Ther Educ. 2001;15:60–64.
8. Utzman RR, Riddle DL, Jewell DV. Use of demographic and quantitative admissions data to predict performance on the National Physical Therapy Examination. Phys Ther. 2007;87:1181–1193.
9. Adams CL, Glavin K, Hutchins K, Lee T, Zimmermann C. An evaluation of the internal reliability, construct validity, and predictive validity of the physical therapist clinical performance instrument (PT CPI). J Phys Ther Educ. 2008;22:42–50.
10. Vendrely AM. An investigation of the relationships among academic performance, clinical performance, critical thinking, and success on the physical therapy licensure examination. J Allied Health. 2007;36:e108–123.
11. Spiegelhalter DJ. Probabilistic prediction
in patient management and clinical trials. Stat Med. 1986;5:421–433. .
12. Brase CH, Brase CP, Kupresanin J. Student Solutions Manual: Understanding Basic Statistics. Boston: Brooks/Cole, Cengage Learning; 2013.
13. Hollman JH, Rindflesch AB, Youdas JW, Krause DA, Hellyer NJ, Kinlaw D. Retrospective analysis of the behavioral interview and other preadmission variables to predict licensure examination outcomes in physical therapy. J Allied Health. 2008;37:97–104.
14. Deyi BA, Kosinski AS, Snapinn SM. Power considerations when a continuous outcome variable is dichotomized. J Biopharm Stat. 1998;8:337–352.
15. Zhao L, Chen Y, Schaffner DW. Comparison of logistic regression and linear regression in modeling percentage data. Appl Environ Microbiol. 2001;67:2129–2135.
16. Donnon T, Paolucci EO, Violato C. The predictive validity of the MCAT for medical school performance and medical board licensing examinations: A meta-analysis of the published research. Acad Med. 2007;82:100–106.
17. Seldomridge LA, DiBartolo MC. Can success and failure be predicted for baccalaureate graduates on the computerized NCLEX-RN? J Prof Nurs. 2004;20:361–368.
18. Yu LM, Rinaldi SA, Templer DI, Colbert LA, Siscoe K, Van Patten K. Score on the Examination for Professional Practice in Psychology as a function of attributes of clinical psychology graduate programs. Psychol Sci. 1997;8:347–350.
19. Stroman L, Weil S, Butler K, McDonald C. The cost of a number: Can you afford to become a surgeon? Bull R Coll Surg Engl. 2015;97:107–111.
Keywords:Copyright 2018 Education Section, APTA
Curriculum review; Exam; Prediction; Risk; Students