Secondary Logo

Share this article on:

Relationship Between Narrative Comments and Ratings for Entry-Level Performance on the Clinical Performance Instrument

A Call to Rethink the Clinical Performance Instrument

Wetherbee, Ellen, PT, DPT, MEd; Dupre, Anne-Marie, PT, DPT, MS, NCS; Feinn, Richard S., PhD; Roush, Susan, PhD, PT

Journal of Physical Therapy Education: December 2018 - Volume 32 - Issue 4 - p 333–343
doi: 10.1097/JTE.0000000000000060
Research Report

Introduction. Clinical instructors (CIs) use narratives to qualify anchor points for 18 performance criteria on the Clinical Performance Instrument: Version 2006 (CPI'06) when assessing students' performance during clinical experiences. Directors of Clinical Education (DCE) final narratives and anchor points on the CPI'06 to make sure that the narratives and anchor points are in alignment with each other. CIs and DCEs should have a mutual understanding of the terminology used on the CPI'06. The purpose of this study was to determine the level of agreement (concordance) between raters on whether examples of narratives supported the term, “entry-level performance” (ELP), before and after focus group discussions.

Methods. Fifty-four CIs discussed the CPI'06 terminology. A pretest/posttest design was used with the intervention being focus group discussions. Participants compared the definition of ELP with sample narratives and agreed, disagreed, or remained undecided about whether the narratives supported ELP. Statistical tests were analyzed to determine participants' concordance about whether the narratives supported ELP. Their rationale to support their decisions was analyzed using qualitative methodology.

Results. Participants' percent of pairwise agreement about whether the narratives supported ELP for testing conditions were highest for safety (65.2–95.3%) and lowest for professional behaviors (34.4–44.0%). Intraclass correlation for multiple raters using ICC(2,1) model indicated interrater reliability for absolute agreement on each test, across five performance criteria, was fair to poor. Qualitative assessment for participants' rationale about decisions indicated inconsistencies in the interpretation of CPI'06 terminology.

Discussion and Conclusion. Agreement among professionals about the interpretation of the language used on the CPI'06 to ensure that students graduate with the requisite skills and behaviors is essential. This study indicates that there was a lack of participants' concordance with the interpretation of ELP on the CPI'06.

Ellen Wetherbee is the clinical associate professor and director of clinical education in the School of Health Sciences, Physical Therapy Program at Quinnipiac University, North Haven, CT 06473 ( Please address all correspondence to Ellen Wetherbee.

Anne-Marie Dupre is the clinical associate professor, assistant director of clinical education in the Department of Physical Therapy at the University of Rhode Island.

Richard S. Feinn is an associate professor of Medical Sciences in the School of Medicine at Quinnipiac University.

Susan Roush is a professor in the Department of Physical Therapy at the University of Rhode Island.

The authors declare no conflict of interest.

Received December 04, 2017

Accepted April 23, 2018

Back to Top | Article Outline


Students in physical therapist education programs (PTEPs) participate in full-time clinical experiences (CEs) where they are expected to demonstrate proficient clinical and behavioral skills. The clinical instructor (CI) is responsible for assessing each student's clinical performance. According to the 2016–2017 PTEPs Fact Sheets,1 individual PTEPs in the United States have contractual clinical-educational relationships, with a mean of 581 clinical facilities, which typically span the nation. Conversely, clinical facilities have contracts with multiple PTEPs. Consequently, CIs from multiple sites assess students from multiple PTEPs.

Given this variability in the relationship between PTEPs and clinical sites during CEs, it is beneficial for the physical therapy profession to have a standardized clinical performance assessment tool to record students' performance during full-time CEs. In the US, the most widely used student clinical assessment tool2 is the Clinical Performance Instrument (CPI): Version 2006 (CPI'06).3 The CPI'06 includes 18 performance criteria, describing behaviors and skills essential to professional practice. The CIs document a student's level of performance on each criterion by indicating a point on a continuum with six defined anchor points, ranging from “beginning performance” to “beyond entry-level performance (ELP).” In addition, CIs are required to provide narratives for each performance criterion to support the anchor point rating.

The CPI'06 has been found to have a valid conceptual framework and construct validity, the ability to demonstrate progress over the course of a CE, and the ability to differentiate between students on early and final CEs.4 There have been no studies, however, to demonstrate the consistency between the CI's rating of a student's performance and the interpretation of the corresponding narrative.

Directors of Clinical Education (DCEs) are responsible for determining if students pass or fail CEs. To make these determinations, multiple sources of information about a student's clinical performance during CEs are available including interviews during midpoint visits/phone calls, documentation of weekly goals, and other discussions that may occur. The most objective source of information that DCEs use to determine if students pass or fail CEs is from the CI's documentation about a student's performance, recorded on the CPI'06. English et al5 determined that 53% of surveyed PTEPs used CIs' narratives to assist in assigning students' grades, when they used the CPI, the predecessor to the CPI'06. For most of those programs that used narratives to assist in grading, the authors found that 74% used comments to “clarify the mark on the visual analog scale (VAS)”(p89) and 31.3% were significantly influenced by the narratives, regardless of the point on the VAS. Although there is no similar information regarding CPI'06, it is reasonable to assume that DCEs continue to use CIs' narratives in a similar manner and seek alignment between the narratives and rating points for each criterion.

The directions for CPI'06 recommend that CIs consider five performance dimensions (PDs) when writing narratives to support the rating points. These PDs are supervision/guidance, quality, complexity, consistency, and efficiency.3 Tsuda et. al6 conducted a study on the narratives used on the original CPI. They found that the quality of narratives on completed evaluations ranged from global, nonspecific statements, “addressing 1 or none of the PDs,”(p59) to narratives that described behaviors descriptively for selected clinical performance criteria. Narratives often suggested goals for practice, yet it was unclear how goals for practice related to a student's performance at the time of assessment. The 5 PDs remained consistent between the original and updated CPI'06 versions, with slight alterations in the definitions of each PD.

The DCEs in the New England Consortium of Clinical Education (NECCE) expressed concern about their ability to interpret CIs' narratives on the CPI’06. This concern primarily related to discrepancies between the narratives associated with the anchor point of ELP on completed evaluations. The DCEs are responsible for administering the final grade for CEs. An essential aspect of this grading process is that DCEs and CIs consistently use and understand the CPI'06 definitions and terminology describing ELP and the PDs. If CIs, who are responsible for recording students' performance, lack reliability when interpreting and using CPI’06 terminology, then DCEs will be inconsistent in their final grading practices. Research analyzing the interpretation of CIs' narratives may inform stakeholders about the ability for DCEs to interpret the narratives in the manner that CIs intend. Ultimately, stakeholders' confidence in the student assessment process will protect the integrity of the educational process and physical therapy profession. The purpose of this study, therefore, was to 1) describe the relationship between CIs' narratives and ratings of ELP on the CPI’06; 2) determine the effect of training on the relationship between CIs' narratives and ratings of ELP on the CPI'06; and 3) identify how CIs typically interpret the language used in narratives.

Back to Top | Article Outline


This was a mixed-method study. The quantitative aspect of the study used a pretest/posttest design with an intervention of focus group discussions. These focus group discussions occurred at a semiannual CI development workshop sponsored by the NECCE at no cost to the participants. The actual focus group discussions took 3 hours, at which time participants reflected on various aspects of definitions and terminology provided in the CPI'06 instructions. The qualitative aspect of the research analyzed how CIs and CCCEs interpreted the typical language used in narratives for final CEs.

Back to Top | Article Outline


Participants for this study were recruited using snowball sampling.7 Initial contacts were made via email addresses obtained from the NECCE database. This database contains emails primarily for Center Coordinators of Clinical Education (CCCEs) at sites used by the NECCE physical therapy PTEPs. From this initial contact, CCCEs were encouraged to invite other CIs to attend the workshop. This research was approved by Quinnipiac University institutional review board. All those who attended the workshop signed a consent to participate in the research.

Back to Top | Article Outline

Instrument Design

The pretest consisted of five narratives describing students' clinical behaviors for five distinct CPI'06 performance criteria. The posttest consisted of 10 narratives. For statistical analysis, the posttest was divided into posttest 1, consisting of the five narratives identical to the pretest, and posttest 2, consisting of five unique narratives, within the same performance criteria as the pretest and posttest 1.

The pre- and posttest narratives were developed after reviewing completed CPI'06 evaluations that were no greater than 3 years old for students who had been on their final CE. After considering all 18 performance criteria, five were selected for this study. Four of the five are designated as “foundational elements in clinical practice,”3 ie, “safety,” “professional behaviors,” “communication,” and “clinical reasoning.” The fifth foundational element, ie, “accountability,” was excluded because the researchers observed that many of the narratives associated with “accountability” used language that was nearly the same as those used to describe “professional behaviors.” Other performance criteria considered essential for PTs included “examination” and “evaluation.” After extensive discussion, there was agreement among the researchers that, in their experience, the narratives regarding “evaluation” were similar to those used to describe “clinical reasoning.” Therefore, the fifth criterion included in the study was “examination.”

Next, 1 to two narratives, associated with an anchor of ELP, were identified for each of the performance criteria. These narratives were assessed to be typical examples used to describe students' performance in the identified criteria. A total of 10 narratives, two for each of the selected CPI'06 performance criteria (“safety,” “communication,” “examination,” “clinical reasoning,” and “professional behavior”), were chosen for the pre- and posttests and modified to eliminate identifying information.

Back to Top | Article Outline


The intervention in this study was focus group discussions that occurred in the context of a semiannual CI development workshop sponsored by the NECCE. The pretest was completed at the beginning of the workshop after consent was obtained. Participants were instructed to read the narratives and indicate if they agreed, disagreed, or were undecided if the narratives met ELP for each of the five performance criteria. Participants were encouraged to provide explanations, for each of the narratives, to support their reasoning. After completing the pretest, participants were placed in groups of five or 6 CIs/CCCEs from mixed practice settings. Two experienced DCEs were also part of each group, serving as a facilitator and scribe. Neither of these individuals contributed their ideas about the discussion. The facilitator ensured that all focus group members had equal opportunities to contribute to the conversation, and the group remained focused on responding to specific questions. The scribe recorded comments on a standardized form.

Immediately after the pretest, participants discussed how they arrived at their decisions about narratives on the pretest. Then, the entire group listened to a 30-minute review of CPI'06 terminology and definitions for ELP and the PDs. After the review, the focus group participants responded to questions about the terminology associated with the CPI'06's definition of ELP and the 5 PDs. Table 1 provides a list of the questions that guided these focus group discussions.

Table 1

Table 1

Immediately after focus group discussions, participants completed the posttest that comprised the 10 narratives (5 repeated from the pretest and five that were previously unseen). Once again, participants were instructed to read the narratives and indicate if they agreed, disagreed, or were undecided if the narratives represented a student who met ELP for each performance criteria and provide comments for each category to support how they made their decision. The hypothesis was that, given the opportunity to revisit the definitions and terminology used in CPI'06 through focus group discussions, participants would demonstrate greater concordance in making determinations about whether the narratives supported the definition of ELP.

Back to Top | Article Outline

Data Analysis

Both qualitative and quantitative data were collected. When looking at the quantitative data, it should be noted that this article considers the concept of “agreement” in two ways. First, participants were asked to rate their level of agreement about whether the narratives supported ELP on a 3-point ordinal scale (agree, undecided, or disagree). Second, agreement is also a descriptive statistic that describes “… how often test-retest scores agree.”8(p.598) This agreement statistic, called the coefficient of agreement, is calculated as the number of exact agreements/number of possible agreements. The agreement statistic is also known as concordance and this latter term will be used for this statistic. As an example of this concept, consider participants' data on pretest if it was distributed as follows: 33% endorsed “agree,” 33% endorsed “disagree,” and 33% endorsed “undecided” that the narratives supported ELP, whereas at posttest, the data changed, indicating that 25% endorsed “agree,” 60% endorsed “disagree,” and 15% endorsed “undecided.” In this example, the data would be said to show greater concordance at posttest. In summary, the term agreement relates to participants' ratings of ELP and concordance refers to the descriptive statistics.

Quantitative data included frequencies of participants who agreed, disagreed, or were undecided that the narratives supported ELP on the pre- and posttests. In addition, the percentage of concordance between participants' responses was calculated. The F-test for equality of variance was used to compare pretest with posttest concordance, and the ICC(2,1) model was used to assess reliability for each testing occasion.

Participants' comments about their decisions on pre- and posttests were analyzed using qualitative methods. After an iterative analysis as described by Robinson Wolf,9 each reviewer independently read the comments on each participants' pretests and posttests and identified codes representing common ideas. Then, all researchers reviewed the comments and codes to determine a final set of themes about why participants agreed, disagreed, or remained undecided about whether the narrative comments supported ELP.

Back to Top | Article Outline


Fifty-four CIs/CCCEs attended this workshop, representing 5 states: ie, Connecticut, Massachusetts, Rhode Island, New Hampshire and Vermont, within 5 practice settings. Most participants were from acute care and outpatient settings. Focus group discussions consisted of nine groups, with each group having participants from mixed practice settings. Numbers of participants providing data at the pretest, posttest 1, and posttest 2, by practice setting, is given in Table 2. Paired data were available on 43 participants. There were 11 participants who either left early or arrived late, making it impossible to pair these participants' tests for analysis.

Table 2

Table 2

The frequency and percentages of participants who agreed, disagreed, or remained undecided about whether each of the narratives supported ELP, relative to their performance criteria, are presented in Table 3. The percentage of pairwise concordances, comparing the response of each participant with the responses of all the other participants, is shown in Table 4. Participants' concordance for the performance criterion of “safety” was highest, rating from 65.2% to 95.3% on pre- and posttests. The performance criterion of “professional behavior” demonstrated the lowest level of concordance among participants, with ratings between 34.4% and 44.0% on pre- and posttests.

Table 3-a

Table 3-a

Table 3-b

Table 3-b

Table 3-c

Table 3-c

Table 3-d

Table 3-d

Table 4

Table 4

An F-test for equality of variances was used to determine whether concordance differed from pretest to posttests. This test was conducted to provide an indication as to whether the intervention of focus group discussions led to greater concordance in determining ELP. A high concordance would indicate that participants gave the same rating and the variance would be small, whereas a low concordance would indicate that participants gave different ratings, leading to a larger variance. It should be noted that this comparison was made only between pretest and posttest 1. Because posttest 2 contained unique narrative comments, it was not possible to compare the results of this test with the pretest. The results are illustrated in Table 5. A significant F value for the “clinical reasoning” criterion indicated a statistically significant difference between pretest and posttest 1 concordance ratings, with improved concordance when participants interpreted narrative comments after focus group discussions. “Clinical reasoning” was the only performance criterion for which there was a significant difference from pretest to posttest 1.

Table 5

Table 5

Interrater reliability for absolute agreement on each pre- and posttest for the five performance criteria was analyzed via intraclass correlation for multiple raters using the ICC(2,1) model. The results of this analysis are presented in Table 6. The pretest and posttest two showed poor reliability (pretest: 0.29; posttest 2: 0.33), whereas the posttest 1 reliability was fair (post-test 1: 0.48). This is based on the parameters for ICC values of <0.40 indicating poor interrater reliability and 0.40–0.59 indicating fair interrater reliability.10

Table 6

Table 6

In addition to the quantitative analysis, participants' comments supporting their decisions on the pre- and posttests were qualitatively analyzed to determine whether there were common themes as to why participants agreed, disagreed, or remained undecided about whether the narratives supported an anchor point of ELP. The qualitative analysis was performed within each response category. Participants who “agreed” that narratives supported ELP made allowances for the complexity of patients and/or types of settings or noted positive trends in the student's performance; “disagreed” that narratives supported ELP when they determined that the exact definition of ELP, as defined on the CPI'06, was not reported in the narrative; and remained “undecided” when they questioned the need for more specific information about a student's performance. Table 3 provides the narratives used for pre- and posttests, the frequencies and percentages of participants' decisions about whether the narratives supported ELP, and samples of participants' remarks supporting their decisions.

Back to Top | Article Outline

Clinical Instructors' Remarks Indicating Agreement With Entry-Level Performance

The CIs who agreed that narratives supported ELP were based on the complexity of patients and/or setting, or if the CI determined that the student demonstrated progress or showed potential. Examples (cited by performance criteria) for each of these situations included:

Back to Top | Article Outline

Complexity of Patient/Setting

  • Examination: “Student demonstrates competency in basic skills and should not be expected, at entry-level, to be 100% competent with highly complex patients.”
  • Clinical reasoning: “Assist for new or complex patient would likely be needed for entry level PT.”
  • Communication: “Pediatric [settings] can be complex and would not expect 100% [from the student].”
Back to Top | Article Outline

Student Demonstrated Progress or Showed Potential

  • Communication: “The fact that they are always willing and looking for feedback is huge!”
  • Examination: “Acceptable new grad hire because of capability to demonstrate improvement.”
Back to Top | Article Outline

Clinical Instructors' Remarks Indicating Disagreement With Entry-Level Performance

Those participants who disagreed that the narratives supported ELP indicated that the exact definition of ELP on the CPI'06 was not met:

  • Communication: “Still some issues ie, pathology 25% assistance and interrupt patient <10%.”
  • Clinical Reasoning: “Not as it applies to definition of entry level-needing assistance 25% of the time is not consistent with definition.”
  • Examination: “At entry-level, the student should not need guidance and should not miss key information.”
Back to Top | Article Outline

Clinical Instructors' Remarks Indicating Undecided About Entry-Level Performance

When participants were undecided, they often indicated the need for more specific information, such as in the following:

  • Professional Behavior: “Not sure what the defensiveness comment is about-is the student defensive? If so, when? What behaviors are exhibited? Professional behavior is much more than showing up on time and dressing appropriately.”
  • Safety: “Need more information on minimal guidance, what percent, and/or what assistance is needed (basic or complex)?” (Linked to the narrative comment about the student's performance in the intensive care unit [ICU]).
  • Clinical Reasoning: “Specific examples of what patients the student is unable to determine how her choices impact long term goals [is needed]. Seems a little vague.”
Back to Top | Article Outline

Clinical Instructors' Lack of Agreement About the Level of Independence

It should also be noted that participants were not always in agreement about how independent students need to be in complex situations. This is exemplified by 1 participant who agreed that the student in the narrative associated with the safety performance criterion demonstrated ELP as follows:

“Highly complex ICU patients requiring guidance is to be expected,”

whereas on the same posttest, another participant who disagreed that the student met ELP stated:

“Continues to require minimal guidance on highly complicated patients in the ICU.”

As the researchers reviewed participants' comments more closely, it became evident that there were inconsistencies in how participants interpreted the actual language used in the narratives. For example, participants interpreted the use of the word “defensive” when used in the narrative for the professionalism performance criterion as noted below:

“I would think defensiveness will decrease as confidence increases with experience.” (Agreed with ELP)

“Being defensive is not an entry-level skill.” (Disagreed with ELP)

“Elaborate on defensive behaviors.” (Undecided about ELP)

Similarly, participants' decisions related to the term “consultation” in conjunction with the safety performance criterion were inconsistent:

“Consulting equals self-assessment.” (Agreed with ELP)

“Student requires occasional consultation.” (Disagreed with ELP)

In particular, there was confusion about the use of the words “guidance” and “supervision,” especially related to the student's level of independence in complex environments or with complex patients. For example, a participant made the following comment about the narrative associated with the examination performance criterion:

“Why the language of ‘supervision?’ Maybe the CI meant ‘consult.’ I, myself, wasn't aware of the distinction really until today.”

Similarly, the following comment appeared on the narrative associated with the clinical reasoning performance criterion:

“Sometimes ‘requires guidance’ is not specific enough for me to determine.”

Back to Top | Article Outline


The DCEs often have multiple sources of data about students' clinical performance (eg, midpoint check-ins, weekly goals) that can be triangulated with the CPI'06 to determine final grades. However, the quality and quantity of these other sources of data can be variable. The most objective data about a student's performance are the summative documentation on the CPI'06. The DCEs review the final evaluation on the CPI'06 and seek alignment between the anchor point and the narratives associated with each performance criteria before assigning a final grade for the CE. If CIs demonstrate a lack of consistency in the use and interpretation of the language associated with CPI'06, there will be a lack reliability in DCEs' interpretation of the intended meaning of the narratives and their relationship with ELP. When this is the case, the value of the narratives in this triangulation process is greatly diminished. Because DCEs assign final grades for CEs, but are not witnesses to the students' clinical performance, it is essential that all users, particularly DCEs and CIs, interpret the terminology and definitions from the CPI'06 reliably as a means to ensure that entry-level graduates have demonstrated a “core set of clinical attributes.”3

This study provides data on the level of concordance among research participants' interpretation of narratives to support a rating of ELP, before and after focus group discussions. The overall participant interrater reliability to interpret narratives was found to be only poor to fair.10 Concordance between participants' decisions about whether the narratives support ELP was influenced by differing opinions regarding the level of independence students should demonstrate, particularly when considering complexity and setting and interpretation of the terminology associated with CPI'06. Of note is that the overall reliability did not improve posttest, despite the opportunity that participants had to review and discuss the definitions of the terminology and directions for use of the CPI'06 through the 3-hour focus group discussions.

The only statistically significant change from pretest to posttest 1 was for “clinical reasoning.” The authors believe that a significant change in concordance between pretest and posttest 1, from agreement to disagreement that the student met ELP, may be related to the focus group discussions. It should be noted that the word “guidance” was used in the narrative describing “clinical reasoning” for pre- and posttests. One of the questions posed to participants during these discussions was to consider the use of the words, “supervision,” “guidance,” and “consultation.” After focus group discussions, participants associated the word “guidance” with poorer performance, resulting in a greater number of participants choosing “disagree” with ELP. It seems that discussion can influence users' interpretation of these terms and DCEs should consider this as a teaching point when discussing the use of the CPI'06 with CIs.

Participants' concordance regarding the narratives was highest for the “safety” performance criteria throughout testing conditions. Greater concordance among participants' decisions about “safety” may be easier to achieve, versus the other performance criteria tested in this study, because of the clear risk of harm to patients. However, related to “safety,” at least two participants held different opinions about how independent the student should be in a complex environment such as the ICU. This indicates that the profession needs to have greater consistency when interpreting the level of independence required for ELP, particularly in specific settings.

Most participants agreed that the narratives for “examination” supported ELP on pretest (78%) and posttest 1 (72.9%), evidently because the student performed basic skills 100% of the time, despite the need for assistance with complex patients. This contrasts with posttest 2 for the “examination” performance criterion. Posttest 1 and 2 were completed directly after focus group conversations, however, posttest 2 contained unique narratives. The narratives for posttest 1 and 2 for “examination” used a similar language regarding the amount of assistance that the student required, except for the notation that the student missed “key” information in posttest 2. The authors believe that use of the word “key” in the posttest 2 narrative for “examination” influenced participants' decisions, resulting in less concordance about how they should interpret the narrative in relationship to ELP. Data indicated a nearly equal split of approximately 30% of participants choosing agree, disagree, or undecided for the “examination” narrative in posttest 2. This indicates that specific words trigger inconsistent responses among readers. Similar to the use of the words “guidance,” “supervision,” and “consultation” referenced in the ICU situation above, CPI’06 users must be clear about the implications intended by the narrative. In addition, users of the CPI'06 need to further consider and agree on the expected level of complexity, independence, and type of setting in which students need to perform at ELP.

In contrast to the “clinical reasoning” and “examination” performance criteria, there was less concordance among participants' assessment of ELP related to skills in the performance criteria of “professional behaviors” and “communication.” Qualitative comments by participants indicated that their decisions were influenced by seeing signs of “growth” or “improvement.” In addition, the complexity of the setting influenced participants' decisions.

The research indicates that CIs can be influenced by their personal and professional judgment when assessing students' performance11,12, as well as by their previous experience with students and comparisons to practicing clinicians.11 Another model of student evaluation for CE is offered by Thompson et al,13 who state that professionals have a good idea about what competence and incompetence look like,(p136) and by Jette et al,14 who found that CIs relied on a “gut feel”(p.838) when determining students' readiness for clinical practice. Participants in this research may actually be using this “gestalt” approach by making decisions based on a combination of inferences they made from the narratives and their perceptions about the complexity of the clinical environment.

When CIs complete the CPI'06 for students on final CEs, they need to rate students on 18 performance criteria. For each of these criteria, they need to consider if the student meets the anchor point of ELP and write narratives that support and elaborate on students' abilities in the PDs. Given the lack of rater consistency in interpreting the narratives found in this study, the meaning of these narratives is in question.

It is not clear whether DCEs across PTEPs expect students to meet ELP for every performance criterion on the CPI'06. However, PTEPs are expected to demonstrate that students meet ELP in all areas of practice by the Commission on the Accreditation of Physical Therapist Education. One feature that may be of benefit to incorporate on the CPI'06 is a “Global Rating of Student Clinical Competence,” similar to the scale used on the Clinical Internship Evaluation Tool,15 developed by Fitzgerald et al to assess student PTs on CEs. This would allow the rater to consider the overall performance of the student and make an overarching statement about whether a student is a competent clinician, offering greater clarity to DCEs.

Researchers in this study hypothesized that allowing participants to discuss and review terminology and definitions used in the CPI'06 would result in greater concordance about whether the narratives supported an anchor point of ELP. Based on the overall lack of interrater reliability for participants to interpret narratives before and immediately after focus group discussions, this study's intervention did not have a significant impact on interrater reliability. These findings are similar to a study by Cook et al16 who found that a half-day workshop did not improve the reliability or accuracy of raters compared with a control group to assess six clinical skills performed by medical students, using the Mini-CEX assessment tool. It is not clear that further training on use of the CPI'06 will result in improved interrater reliability to interpret narratives, however, further research is needed.

Based on the findings in this research, the definitions and terminology used to define ELP need to be reviewed to ensure that all users are consistent in their interpretation of students' performance. In fact, the definitions of the terminology used in the CPI'06 may be overly complex, making it difficult for CIs to express their judgment about students' clinical competence. In addition, although users of the CPI'06 become certified to use the instrument based on a tutorial, this research indicates that 3 hours of discussion related to the CPI'06 did not create greater reliability in users' ability to interpret narratives, lending further evidence that the CPI'06 needs to be reevaluated.

Limitations to this study include the fact that participants were of a small sample size, with limited practice diversity being from only 1 region in the US. In addition, participants did not have context in terms of patient setting when they read narratives. It should be noted, however, that passing the CPI'06 tutorial requires assigning an anchor point on performance criteria, based on example narratives, similar to the process used in this study.

Finally, DCEs make the final decisions about grading the CPI'06, based on their interpretation of how well the CI's narrative comments support the anchor point of ELP. We acknowledge that this research did not determine the reliability between DCEs' interpretation of narrative comments. The researchers believed that we needed to explore the reliability between CIs' use and interpretation of CPI'06 definitions and terminology because they complete the CPI. If CIs are inconsistent in the use of language and interpretation of CPI'06 language, then this could have an impact on the reliability of grading decisions made by DCEs. Further study is recommended to determine the interrater reliability of DCEs' interpretation of narrative comments in relationship to ELP.

Back to Top | Article Outline


It is essential for the physical therapy profession and the public to be assured that student PTs demonstrate the essential skills and behaviors for professional practice. Therefore, there needs to be agreement among professionals about how to interpret clinical assessment tools used during students' CEs. This study indicates that there was a lack of participants' concordance about sample narratives supporting the definition of ELP on the CPI'06. It is not known whether further training would increase CPI'06 users' ability to consistently interpret these narratives. Further study is indicated to determine whether the CPI'06 needs modification to enhance raters' consistency to interpret narratives in relationship to the anchor point of ELP. In addition, research is needed to determine the most effective means to educate CIs on the use of the CPI'06.

Back to Top | Article Outline


1. 2016-2017 Physical Therapist Education Programs Fact Sheets. Alexandria, VA: Commission on the Accreditation of Physical Therapy Education; 2016.
2. Awarski G, Ellis B. Updates on clinical performance instrument. Presented at: As part of Clinical Education Special Interest Group Meeting at APTA Combined Sections Meeting; February 18, 2017; San Antonio, TX.
3. Physical Therapist Clinical Performance Instrument. Alexandria, VA: American Physical Therapy Association; 2006.
4. Roach KE, Frost JS, Francis NJ, Giles S, Nordrum JT, Delitto A. Validation of the revised physical therapist clinical performance instrument (PT CPI): Version. Phys Ther. 2006;92:416–428.
5. English ML, Wurth RO, Ponsler M, Milam A. Use of the physical therapist clinical performance instrument as a grading tool as reported by academic Coordinators of clinical education. J Phys Ther Educ. 2004;87–92.
6. Tsuda H, Low S, Vlad G. A description of comments written by clinical instructors on the Clinical Performance Instrument. J Phys Ther Educ. 2007;7:56–62.
7. Sadler GR, Lee HC, Lim RS, Fullerton J. Recruitment of hard-to-reach population subgroups via adaptations of the snowball sampling strengths. Nurs Health Sci. 2010;12:369–374.
8. Portney LG, Watkins MP. Foundations in Clinical Research: Applications to Practice. 3rd ed. Philadelphia, Pennsylvania: Prentice-Hall Inc; 2015.
9. Robinson Wolf Z. Ethnography: The method. In: Munhall PL. Nursing Research: A Qualitative Perspective. 4th ed. London, United Kingdom: Jones and Bartlett Publishers; 2007.
10. Cicchetti DV, Sparrow SA. Developing criteria for establishing inter-rater reliability of specific items: Application to assessment of adaptive behavior. Am J Ment Defic. 1981;86:127–137.
11. Alexander HA. Physiotherapy student clinical education: The influence of subjective judgments in observational assessment. Assess Eval Higher Educ. 1996;21:357–367.
12. Cross V. Begging to differ: Clinicians' and academics views on desirable attributes for physiotherapy students on clinical placements. Assess Eval Higher Edu. 1998:23:295–311.
13. Thompson GA, Moss R, Applegate B. Using performance assessments to determine competence in clinical athletic training education: How valid are our assessments? Athl Train Educ J. 2014;9:135–141.
14. Jette D, Bertoni A, Coots R, Johnson H, McLaughlin C. Clinical instructors' perceptions of behaviors that comprise entry-level clinical performance in physical therapist students: A qualitative study. Phys Ther. 2007;87:833–843.
15. Fitzgerald L, Delitto A, Irrgang JJ. Validation of the clinical internship evaluation tool. Phys Ther. 2007;87;844–860.
16. Cook DA, Dupras DM, Beckman TJ, Thomas KG, Pankratz S. Effect of rater training on reliability and accuracy of mini-cex scores: A randomized, controlled trial. J Gen Intern Med. 2009;24:74–79.

Clinical performance instrument; Entry-level performance; Narrative comments

Copyright 2018 Education Section, APTA