The Pediatric Toronto Extremity Salvage Score (pTESS): Validation of a Self-reported Functional Outcomes Tool for Children with Extremity Tumors : Clinical Orthopaedics and Related Research®

Secondary Logo

Journal Logo


The Pediatric Toronto Extremity Salvage Score (pTESS): Validation of a Self-reported Functional Outcomes Tool for Children with Extremity Tumors

Piscione, Janine BA, BScPT, MSc; Barden, Wendy PScPT, MSc; Barry, Janie MSc; Malkin, Alexandra MSc; Roy, Trisha MD, PhD; Sueyoshi, Tyki BSc; Mazil, Karen BN; Salomon, Steven MSc; Dandachli, Firas MD, MSc; Griffin, Anthony MSc; Saint-Yves, Hugo MSc; Giuliano, Pina; Gupta, Abha MD, MSc; Ferguson, Peter MD, MSc; Scheinemann, Katrin MD; Ghert, Michelle MD; Turcotte, Robert E. MD; Lafay-Cousin, Lucie MD, MSc; Werier, Joel MD; Strahlendorf, Caron MB, BCh; Isler, Marc MD; Mottard, Sophie MD; Afzal, Samina MBBS; Anderson, Megan E. MD; Hopyan, Sevan MD, PhD

Author Information
Clinical Orthopaedics and Related Research 477(9):p 2127-2141, September 2019. | DOI: 10.1097/CORR.0000000000000756



Bone tumors in pediatric patients, although relatively rare, often result in significant disability in young, skeletally immature patients who survive their disease. In those younger than 20 years, the two most common primary malignant tumors of bone are osteosarcoma and Ewing’s sarcoma, with annual incidences in the United States of 8.7 and 2.9 per million persons, respectively [25]. As per Surveillance, Epidemiology, and End Results data in the United States [25], 57% of malignant bone tumors arise in the long bones of the lower extremity, whereas approximately 13% develop in the long bones of the upper extremity. Combined systemic and local management of these tumors results in 60% to 70% event-free survival at 5 years [41]. Local control is often achieved by surgery and, for any individual tumor, multiple surgical approaches or reconstructive options are often available. Despite the importance of defining lifelong functional alterations that will result from various surgical approaches, very few tools are available to measure those changes, especially in the pediatric population [6,22,26,28,43,56].

It was previously thought that children were not capable of independently reporting function, so parents and/or clinicians completed various assessments. However, it has been verified that children are capable of reporting their own physical disability [40,58]. Although these patients receive their diagnosis at a median age of 14 to 16 years [52], the majority of reports examining physical function relate to adult survivors of pediatric bone tumors [1,3,7,10,22,24,28,42,51,54]. A 2012 systematic review examined functional ability and physical activity after surgical resection for sarcoma and noted that most studies included participants with a large age range and thus may have overlooked issues specifically relating to the pediatric population [3]. To date, few studies have measured physical function in children or adolescents at the time of, or shortly after, treatment [46].

To describe physical function after the treatment of a bone tumor, the Musculoskeletal Tumor Society Rating Scale [17,18] was developed and became widely used. However, it is a subjective measure that is based solely on the clinician’s report, and Musculoskeletal Tumor Society Rating Scale scores do not correlate with objective measures of function in adolescents [37]. The Activities Scale for Kids was identified as one potential option but was found to have a high “not applicable” rate and high ceiling effects [47]. The Toronto Extremity Salvage Score (TESS) has also been identified as a potential option for pediatric patients with bone tumors, but it has been questioned whether an adult measure should be applied to children [46]. The TESS was developed by Davis et al. [13] as the first self-reported measure of physical function specifically for patients with sarcoma. Although validated for patients between 12 and 85 years old, the tool was essentially designed for adults, and the mean age of participants during development was 41 years [13]. Many TESS items are not applicable to children (for example, performing work duties, sexual activity, and driving) and adult measures may lack sensitivity for children with respect to skill level, type and frequency of activities performed, and the relative importance of activities [35,57].

A pilot study in Australia preliminarily examined unpublished modifications to the TESS based on clinical judgment and informal discussions with children (Dr. M. Clayer, personal communication, March 2009). Because these modifications were made in Australia, cross-cultural adaptation and modifications needed to be examined further [2]. Before this tool is used in any clinical or research endeavors, its measurement properties need to be explicitly validated.

We therefore asked: (1) What is the best format and content for new upper- and lower-extremity measures of physical function in the pediatric population? (2) Do the new measures exhibit floor and/or ceiling effects, internal consistency, and test-retest reliability? (3) Are the new measures valid?

Patients and Methods

Eligible participants were between 7 and 17.9 years old with a malignant or benign aggressive bone tumor of the upper or lower extremity. Children on and off chemotherapy were recruited. All participants were eligible if tumor resection surgery was between 3 months and 10 years before participation. Participants were excluded if they had any form of cognitive impairment, could not read or write in English or French, had local or systemic disease recurrence, or had preexisting neuromusculoskeletal comorbidities influencing physical function. Participants who underwent an amputation because of failure of previous limb salvage surgery were categorized as having had an amputation.

The study was divided into three distinct phases:

Phase 1: Format and Content Development of the pTESS

The draft questionnaire developed in Australia was initially evaluated using a qualitative research method called cognitive debriefing. This method involves structured one-on-one interviews to test how individuals comprehend and answer self-reported questionnaires [31]. Participants read and complete the draft questionnaire and then the interviewer asks followup probe questions to obtain feedback on format, response options, important omissions or inappropriate inclusions, and ease of use. These qualitative interviews were conducted with consecutive children meeting the inclusion criteria who attended the orthopaedic oncology clinic at the Hospital for Sick Children in Toronto between September 2010 and February 2011. Five had upper extremity tumors and 12 had leg tumors. In the initial round of interviews, we interviewed four children with leg tumors and two children with arm tumors. Their feedback was used to make changes to the draft questionnaires, and new versions of the tools were created. Three more interviews (two children with leg tumors and one with an arm tumor) were conducted to further modify the measures. This process of interviews, followed by modifications, continued until participants recommended no further changes during the interviews and the two pTESS questionnaires were finalized (see Appendices, Supplemental Digital Content 1,, and 2, Seventeen children and adolescents participated in the first phase of the study, which was similar to the number of participants who were interviewed to develop the adult TESS [13]. In the leg group, two of the 12 participants were boys and their ages at the time of assessment ranged from 8 to 17 years (mean = 13 years, SD = 2.4) with a mean time since tumor resection of 2.5 years (range, 0.1-6 years). In the arm group, two of the five participants were boys, with ages ranging from 8 to 15 years (mean = 12 years, SD = 3.1) and a mean time since surgery of 3 years (range, 0.7-6 years; SD = 1.8).

Phase 2: Translation

The pTESS questionnaires were translated into French using the framework by Beaton et al. [2]. Two bilingual individuals at the Hospital for Sick Children performed independent forward translations from English to French, leading to the first French consensus version. Next, two independent back translations of the French version were performed by two French native speakers at a collaborating center in Montréal. All four translators reviewed the back translations and the final French consensus versions were confirmed (see Appendices, Supplemental Digital Content 3,, and 4,

Phase 3: Testing of Measurement Properties

The primary study site was the Hospital for Sick Children in Toronto with additional recruitment at 10 other North American centers. Because of varied staffing and ethics procedures, some centers mailed the initial questionnaires during Phase 3, whereas others gave them to eligible participants in the clinic or during inpatient stays. However, all participants were instructed to complete the questionnaires at home. Between March 2012 and February 2017, 181 participants at 10 centers across North America were deemed eligible, and 122 (67%) consented and completed at least the first questionnaire (Fig. 1). There were no differences between participants (n = 122) and nonparticipants (n = 59) in terms of age (participants: mean = 13 years, nonparticipants: mean = 13 years; p = 0.6) or gender (participants: 62% were boys, nonparticipants: 38% were boys; p = 0.1).

Fig. 1:
A Consolidated Standards of Reporting Trials flow diagram is shown for participants in Phase 3.

The study design was cross-sectional, with the exception of the test-retest reliability component, which was longitudinal. All participants completed the limb-appropriate measure evaluating test-retest reliability on two occasions separated by approximately 2 to 4 weeks. We postulated that function would not change during this period, but this interval would allow enough time for the participant to forget previous responses [58]. After completing the pTESS the first time (T1), participants were instructed to return the completed measure to the investigators. Participants were then notified to complete the pTESS again (T2). All participants completed a form asking if there had been any change in function between T1 and T2. When the first set of questionnaires was received, medical and surgical data were abstracted through structured record review.

Outcome Measures


The pTESS was designed as a measure of the difficulty patients have performing routine daily activities. There is both an upper-extremity version (pTESS-Arm Questionnaire) and a lower-extremity version (pTESS-Leg Questionnaire). Each item is answered using a 5-point ordinal scale, and all item responses are aggregated into a single summary score. Five points are allotted for the first response option (“not at all hard”), 4 points for the second option, and so on, until 1 point is allotted for the fifth option (“too hard. I can’t”). Each item has a “not applicable” option, phrased as: “I do not do this.” This not applicable (NA) option is not given a numeric score and for scoring is considered invalid. Summary scores indicate physical function ranging from 0 (lowest) to 100 (highest). Summary scores were calculated with the equation:

Summary scores were not calculated if > 25% of the responses were not completed (that is, the patient reported “I do not do this” or not applicable). There are two additional VAS items on the pTESS, both of which are scored on a 10-cm line. The first reads: (1) “DOING all the things I want to do is:” and is rated from “easy” (0) to “very very hard” (10). The second reads: “How do you FEEL about what you can do?” and is rated from “very bad” (0) to “very good” (10).

pTESS-Leg Questionnaire

Of the 181 recruited participants, 154 pTESS-Leg questionnaires were provided to children and adolescents and 95 (62%) were returned with completed data. Two participants, who were still receiving chemotherapy, did not have usable data because they used the NA option > 25% during T1 and therefore summary scores were not calculated. Of the remaining 93 participants, 73 (79%) completed the questionnaire during T2. However, two participants used the NA option > 25% at T2, so only 71 scores were usable (Fig. 1). Forty-four of the 93 (47%) participants were boys, and the mean age at T1 was 13 years (range, 7-17 years; SD = 2.8). The mean time since tumor resection was 3 years (range, 0.3-9.4 years; SD = 2.2) (Table 1).

Table 1.:
Sample characteristics in Phase 3
Table 1-A.:
Sample characteristics in Phase 3

pTESS-Arm Questionnaire

Twenty-seven pTESS-Arm questionnaires were provided, and all were returned with complete data at T1; 20 of 27 (74%) were completed at T2. One participant used the NA option > 25%, so only 19 T2 scores were usable (Fig. 1). Nine of 27 (33%) participants were boys, and the mean age at T1 was 13 years (range, 7-17 years; SD = 3.3). The mean time since tumor resection surgery was 3 years (range, 0.3-7.3 years; SD = 2.3).


The adult TESS [13] was provided to participants who were 16 or 17 years old at the time of the study.

Statistical Analysis

Item content was examined in two ways: frequencies of NA responses by item and item-to-item correlations. Additionally, floor and/or ceiling effects were examined. As in the development of the adult TESS, the a priori criterion of 30% was used for eliminating items based on NA responses [13]. A correlation matrix is a method of evaluating how a particular item relates to other items [19]. Individual correlations between 0.30 and 0.70 are desired to exclude items not sufficiently related or redundant, and a corrected average interitem correlation coefficient of r = 0.40 is suggested for a measure [19]. If < 15% of respondents scored at the absolute maximum or minimum of the scale in their single standardized summary score, the tool would meet the accepted standards of measurement [39].

At T1 and T2, 11 (0.4%) and 16 (0.7%) items were missing and 105 (3.8%) and 101 (4.7%) items were rated not applicable, respectively. As stated earlier, the summary score calculation omits any items coded as “not applicable.” These items would have represented a significant loss of information if not recoded for the evaluation of internal consistency. Therefore, missing data were imputed using the item’s median score. To test whether NA data affected our evaluation of internal consistency, we tested whether a range of scores (0, 1 “not at all hard”; and 5 “too hard. I can’t do this”) would have altered the calculation. Internal consistency was evaluated by calculating Cronbach’s α for item data only, not summary scores, and compared each time. An α of > 0.90 is recommended for comparing groups in clinical research [44]. Reliability was analyzed using the one-way random effects intraclass correlation coefficient (ICC) [48]. Only scores for respondents reporting no change between T1 and T2 were included in the analysis.

Validity, or the degree to which data measure what they are intended to measure, was evaluated using three strategies [20], and summary scores at T1 were used in all validity analyses. First, known-group validity is the ability of a measure to discriminate between two distinct groups. An independent t-test or Mann-Whitney U test evaluated the difference in pTESS scores between respondents using gait aids and/or braces and those who did not. Second, construct validity involves forming a priori theories about the attribute of interest and then assessing the extent to which the measure provides results consistent with the theories. Pearson and Spearman correlation coefficients (depending on sample size and distribution) were used to evaluate construct validity. The following a priori hypotheses were evaluated: (1) children who were still receiving chemotherapy, or had completed therapy recently (within 30 days), would demonstrate lower scores on the pTESS than those off chemotherapy for a longer time; (2) children who recently had tumor resection surgery would demonstrate lower scores on the pTESS than those with longer postoperative times; and (3) children with lower global self-ratings of physical function on the VAS would demonstrate lower scores on the pTESS. The third method used in the evaluation of validity was criterion validity. This method is the correlation of the scale with some other measure or “gold standard” accepted in the field [55]. Pearson correlation coefficients were used to evaluate whether adolescents with lower scores on the adult TESS (that is, the gold standard) had lower scores on the pTESS. As per Cohen, a correlation of ≥ 0.5 is thought to represent a strong or large correlation, with 0.30-0.49 representing a moderate correlation [9].

Finally, a brief examination of face and content validity was performed with data from the 16- and 17-year-old participants. Face validity is the extent to which a measure is phrased in a suitable way, has appropriate response options, and is aimed at the right concept. Content validity is the extent to which the measure includes important items and does not contain inappropriate inclusions. Participants who received the adult TESS were provided with a brief form asking for feedback and thoughts on the pTESS versus TESS. The form asked which questionnaire: (1) had more items that were important; (2) was meant for their age group; and (3) was liked better overall.

The sample size was based on ensuring sufficient power for test-retest reliability using the method described by Donner and Eliasziw [16]. Thirty-five analyzable patients ensured a lower bound 95% CI that included an ICC of 0.80 when the true ICC was 0.90 when α was 0.05 and power was 80%.


Phase 1: Format and Content Development for pTESS

In the upper-extremity version, cognitive interviews demonstrated the need to remove two items and modify 10 items. In the lower-extremity version, one item was removed, 11 were modified, and two new items were added. Recommended changes to make the measure more “child-friendly” included a complete change to the response options (for example, “extremely difficult” was changed to “very hard”), a change to the title of each version of the questionnaire (“upper extremity” was changed to “arm” and “lower extremity” to “leg”), and a change to the overall instructions to complete the tool. The order of response options was also reversed. In the TESS, the first response option is “impossible to do.” In the pTESS, the first response option is “not hard at all” and descends to the greatest difficulty (that is, “too hard. I can’t do this”). Finally, two VAS domains were included to address overall subjective assessments of function based on participant feedback. The revised questionnaires were used for Phases 2 and 3 of this study.

Phase 2: Translation

The translators did not encounter any difficulties with the translation process, and the French-language pTESS questionnaires were used for French-speaking participants in Phase 3.

Phase 3: Testing of Measurement Properties

Refinements to the pTESS-Leg Questionnaire

Two items on the pTESS-Leg, “playing in the sand” and “playing games like Twister,” were rated NA at an average frequency of 40% and 33% at T1 and T2, respectively. Therefore, using the 30% criteria, these two items were eliminated. The average interitem correlation coefficient for the pTESS-Leg at T1 was 0.42 (SD = 0.08). On further examination, “sitting” had an extremely low mean correlation of 0.11 (SD = 0.09), well below the accepted standard of 0.40. Additionally, 94% of respondents reported sitting as “not at all hard” and the other 6% reported “a little bit hard.” Therefore, based on the lack of relation to other items and the lack of variability in scoring, “sitting” was also eliminated from the questionnaire. All further analyses were performed on the pTESS-Leg with 30 items (Table 2).

Table 2.:
pTESS items

Nine respondents completed the pTESS-Leg in French. Summary scores were compared between English-language respondents (T1: n = 84, mean = 76 SD = 20; T2: n = 65, mean = 80, SD = 16) and French-language respondents (T1: n = 9, mean = 97, SD = 15; T2: n = 6, mean = 95, SD = 13) with no differences between them at T1 (p = 0.4) or T2 (p = 0.9). Therefore, all further analyses combined English- and French-speaking respondents, unless otherwise stated.

Refinements to the pTESS-Arm Questionnaire

One item on the pTESS-Arm, “digging in a sandbox,” was rated NA at an average frequency of 44% and 55% at T1 and T2, respectively. Again, by applying the 30% criteria, we removed this item. The mean correlation of the matrix of the pTESS-Arm at T1 was 0.33 (SD = 0.18). Two items demonstrated very low mean item correlations, “coloring a picture” (r = 0.18, SD = 0.16) and “going to school every day, all day” (r = 0.18, SD = 0.13). Additionally, only five items had mean item correlations of > 0.40. However, because of the low sample size for the pTESS-Arm, all 27 items were retained (Table 2). Only one participant completed the pTESS-Arm in French and these data were included with the English-language data for all further analyses.

Floor/Ceiling Effects, Internal Consistency, and Test-retest Reliability

pTESS-Leg Questionnaire

There were no floor effects for the pTESS-Leg, and only seven (8%) and five (7%) respondents scored at the ceiling at T1 and T2, respectively. Both administrations were below the 15% a priori threshold, tending to confirm the absence of ceiling effects on the pTESS-Leg. The overall mean pTESS-Leg summary scores at T1 and T2 were 77 (range, 23-100; SD = 19) and 80 (range, 29-100; SD = 15), respectively. pTESS summary scores were also examined for variability and distribution with respect to tumor histology and surgical procedure. Participants with malignant tumors demonstrated greater variability and overall lower summary scores (T1: mean = 76, SD = 19; T2: mean = 80, SD = 15) than did those with benign aggressive tumors (T1 mean = 88, SD = 17; T2 mean = 82, SD = 1) (Fig. 2). The distribution of pTESS-Leg scores based on surgical resection was also examined. The mean pTESS scores for the limb salvage group were 76 (SD = 20) and 78 (SD = 16) at T1 and T2, respectively (Fig. 3). When the three ablative groups (below-knee amputation, above-knee amputation, and rotationplasty) were grouped together, the mean scores at T1 and T2 were 77 (SD = 17) and 79 (SD = 15), respectively.

Fig. 2:
This boxplot illustrates the distribution of pTESS Summary Scores for the pTESS-Leg and pTESS-Arm questionnaires at Time 1 (T1) and Time 2 (T2) based on tumor histology.
Fig. 3:
This boxplot illustrates the distribution of pTESS-Leg summary scores at Time 1 (T1) and Time 2 (T2) based on surgical procedure.

The pTESS-Leg demonstrated a high level of internal consistency, with a Cronbach’s α of 0.95. As described earlier, three different codings were performed, and Cronbach’s α remained high (α = 0.95 for all).

The scores for participants who reported no change in function between T1 and T2 (n = 48) showed a high level of concordance, with an ICC = 0.94 (95% CI, 0.90-0.97; p < 0.001). The ICC calculation was repeated without the French-speaking respondents (n = 4), and the ICC remained high at 0.95 (95% CI, 0.90-0.97; p < 0.001). The sample of French-speaking respondents was too small to perform an ICC. However, for the French-speaking participants who reported no change in function (n = 4), the mean difference was low (mean difference = 2.7; SD = 3.2; 95% CI, -2.4 to 7.8; p = 0.2).

pTESS-Arm Questionnaire

There were no floor effects for the pTESS-Arm, and only three (11%) and one (5%) respondents scored at the ceiling at T1 and T2, respectively. Both administrations were below the 15% threshold for the standards of measurement, tending to confirm the absence of floor or ceiling effects for the pTESS-Arm. The overall mean pTESS-Arm summary scores at T1 and T2 were 81 (range, 38-100; SD = 18) and 73 (range, 33-100; SD = 23), respectively. Participants with malignant tumors demonstrated greater variability and overall lower summary scores (T1: mean = 76, SD = 20; T2: mean = 72, SD = 24) than did those with aggressive benign tumors (T1 mean = 92, SD = 9; T2 mean = 81, SD = 8) (Fig. 2).

Internal consistency on the pTESS-Arm was high. Cronbach’s α was calculated and remained high with all three codings (α ranging from 0.92 to 0.94).

Scores for participants who reported no change in function between T1 and T2 (n = 13) showed a high level of concordance, with an ICC of 0.86 (95% CI, 0.61-0.96; p < 0.001).

Known-group, Construct, Criterion, Face, and Content Validity

pTESS-Leg Questionnaire

Known-group validity analyses were used to compare responses from participants who required a gait aid and/or brace (n = 51) for ambulation with those not requiring any aid or brace (n = 42). The group of participants requiring a brace or aid demonstrated significantly lower pTESS scores (mean = 68; SD = 21) than did those who did not use a brace or aid (mean = 87; SD = 11; p < 0.001) (Fig. 4A).

Fig. 4:
The boxplot and scatterplots display pTESS-Leg summary scores as a function of (A) gait aid use; (B) time since last chemotherapy treatment; (C) time since tumor resection; (D) VAS scores for “Doing all the things I want to do”; and (E) adult TESS_Lower extremity summary score.

Construct validity analyses were used to examine the ability of the pTESS-Leg to assess the aforementioned a priori theories regarding chemotherapy, surgery, and overall function. Time off chemotherapy correlated moderately with higher pTESS-Leg scores (r = 0.4; p < 0.001; 95% CI, 0.2-0.5). Second, the time since tumor resection surgery also correlated moderately with higher pTESS scores (r = 0.4; p < 0.001; 95% CI, 0.2-0.6). Finally, a higher VAS rating (that is, it was harder for the participant to do all the things they wanted to do) strongly correlated with lower pTESS scores (r = -0.7; p < 0.001; 95% CI, -0.8 to -0.5) (Fig. 4B-D). These results concur with the a priori hypotheses described earlier.

Criterion validity analyses demonstrated a very strong correlation between adult TESS scores and the pTESS scores for adolescent participants (r = 0.97; p < 0.001; 95% CI, 0.8-1.0) (Fig. 4E).

All analyses were repeated with the English-only and French-only respondents. The group of English-only participants demonstrated the same results as the entire group. In the French-only (n = 9) group, because of the small sample size, Spearman correlations were performed: (1) time from chemotherapy (r = 0.9; p < 0.001; 95% CI, 0.6-1.3); (2) time from surgery (r = 0.7; p = 0.03; 95% CI, 0.1-1.4); and (3) VAS rating (r = -0.7; p = 0.02; 95% CI, -1.4 to -0.1). Only one French-speaking participant completed the adult TESS, so no analyses were performed.

In examining the face and content validity of the pTESS-Leg versus the TESS in older adolescent participants, 72% (13 of 18) liked the adult TESS better than the pTESS overall. Fifty-five percent (11 of 20) found that the TESS was meant more for their age group whereas only four of 20 (20%) felt the pTESS was more suited to their age (five of 20 responded “no difference”). When asked which questionnaire had more items that were important to them, most adolescents (11 of 20 or 55%) responded no difference whereas five of 20 chose the TESS and four of 20 chose the pTESS.

pTESS-Arm Questionnaire

In the examination of known-group construct validity, five respondents used an upper extremity brace (Fig. 5A). When pTESS-Arm summary scores were compared with those of participants not using a brace, there was no difference in function, although the mean scores were lower for brace users by 10 points (n = 5; mean = 73; SD = 11) than for participants with no brace (n = 22; mean = 83; SD = 19; p = 0.13).

Fig. 5:
The boxplot and scatterplots display pTESS-Arm summary scores as a function of (A) gait aid use; (B) time since the last chemotherapy treatment; (C) time since tumor resection; (D) VAS scores for “Doing all the things I want to do”; and (E) adult TESS_Upper extremity summary score.

For construct validity, no correlations between the length of time from the last chemotherapy session (r = 0.1; p = 0.8; 95% CI, -0.1 to 0.08) or length of time since tumor resection (r = 0.2; p = 0.4; 95% CI, -0.05 to 0.1) were observed (Fig. 5B-C). Strong antecorrelation was observed between the VAS rating and the pTESS-Arm summary scores (r = -0.8; p < 0.001; 95% CI, -1.0 to -0.5) (Fig. 5D). The evaluation of criterion validity demonstrated that adult TESS scores also correlated very strongly with the pTESS-Arm score for older adolescent participants (r = 0.9; p = 0.007; 95% CI, 1.0-3.7) (Fig. 5E).

In the examination of face and content validity, of the eight participants who also completed the adult TESS five preferred the TESS to the pTESS-Arm overall. Five of the eight participants found that the TESS was meant more for their age group (one felt there was no difference). When asked which questionnaire had more items that were important to them, four patients chose the TESS, whereas the other four chose no difference.


Largely because of a lack of standardized tools, no studies have examined the physical function of children and adolescents with bone tumors using a tool specifically designed for pediatrics. Although some studies have examined children and adolescents with bone tumors, they included adults in the study population and used either the adult TESS and/or the Musculoskeletal Tumor Society Rating Scale to evaluate function [1,4,23,27,50,51]. For a tool to be useful in clinical settings, it must be shown to be reliable and valid within that specific population [36,38,55]. This study presented the process of content development of the pTESS, a tool specifically designed for pediatric patients with bone tumors. The pTESS demonstrated excellent internal consistency, no floor and/or ceiling effects, very high test-retest reliability, and strong preliminary validity.

Our study was subject to several limitations. The sample size sample of children interviewed in Phase 1 consisted only of 17 participants. At first, this may seem small, but similar numbers have been used in the early interview stages of the development of other pediatric self-reported measures, such as The Activities Scale for Kids (n = 20) [58] and the Kids’ Immune Thrombocytopenic Purpura Tools (n = 12) [33], as well in the early development of the adult TESS (n = 16) [11]. The rounds of interviews were continued until no new information was obtained. Although the sample was not balanced by gender, scale items are not gender-specific (that is, they relate to general activities of daily living, school, and play) and therefore should not be influenced by gender. In terms of the sample size in Phase 3, of the 122 respondents at T1, 18 (15%) had malignant tumors of the arm, which is similar to the incidence of malignant tumors in the upper extremity in a previous study (13%), [25]. The implications of this small upper-extremity sample are described below with respect to reliability and validity. Additionally, in Phase 3, there was attrition because of the primarily mail-dependent nature of the study’s methodology. Response rates for mailed surveys or questionnaires are often as low as 15% to 20% [15] but previous studies with adolescent oncology patients involving mail questionnaires yielded response rates of 50% [29,32]. When patients with arm and leg tumors were both combined, our overall response rate at T1 was 67%, which is considered very good for mailed measures [14,15].

There were also limitations regarding recruitment, environment, and reporting time. Recruitment formats differed slightly at the various collaborating sites; therefore, response bias may have influenced participation. Although all participants were reassured that a lack of participation would not influence their care, participation may have been elevated when initial contact was made via phone or in clinic, rather than solely by mail. However, this difference should not have had any influence on the way participants responded to the tool. Both environment and reporting time may have influenced the number of participants reporting change between T1 and T2. All participants were instructed to complete the questionnaires at home, but there was no guarantee that all children followed these instructions when initially approached in a clinic or inpatient setting. If instructions were not followed, this could be a reason for the higher than anticipated number of participants who reported change between T1 and T2 because of reporting in different environments. Similarly, all participants were asked to return completed T1 forms by mail, and then investigators informed them to return completed T2 also by mail. Unfortunately, despite every attempt to follow the methods described by Dillman [14,15], there was variability within the sample in the duration between the two administrations. Again, this variability may have caused an increase in the number of participants who reported change between times and resulted in a smaller sample than desired for test-retest reliability. Despite this high change rate, test-retest reliability was high, but future studies could consider using an online version to minimize the variability in time to return T2 [8].

A final limitation is that responsiveness to change in function over time for individual patients was not evaluated in this study. Given the results for known-group validity, we anticipate that the pTESS may respond to longitudinal change, although further evaluation is needed to determine what difference in pTESS scores represents meaningful change. The results presented here show that the pTESS can be used in cross-sectional designs.

Content Development

Questionnaire content was modified based on feedback from young patients, because children are able to make unique and important contributions to questionnaire development [58]. Using cognitive interviews and item evaluation, we modified the content, format, and response options of the adult version to make the questionnaire more “child-friendly.”

The distribution of pTESS scores from a clinical perspective are fairly intuitive. The overall summary scores of children with benign aggressive tumors were higher than those of children with malignant tumors. Typically, surgical resection for osteosarcoma or Ewing’s sarcoma is often more complex than for benign lesions, and adjuvant therapies can make functional recovery more difficult. The variability of summary scores by surgical procedure was also anticipated based on clinical experience. The above-knee amputation group demonstrated lower mean pTESS-Leg summary scores than the below-knee amputation group, and this finding was supported in sarcoma research using the TESS [1,12,22,23]. Prior studies have often grouped ablative procedures together and compared them with salvage procedures [1,4,42,51]. In this study, there was decreased variability within the amputation/rotationplasty groups compared with the limb salvage group, but the mean scores for both groups were similar. Again, this trend has been observed in two studies using the TESS [42,51]. Finally, patients with fibular resection had the highest mean summary scores of all patients who underwent surgical procedures, possibly because of the lack of skeletal reconstruction required.

The TESS and pTESS use the “degree of difficulty” as the concept of measurement. However, there is some debate regarding which concept is best suited to this particular pediatric population. Although children have been shown to make successful judgments about concrete tasks using Likert scales, research suggests they may better understand the “frequency of activity” than “degree of difficulty” [40,58]. However, a contrary finding was observed in children with bone tumors, in that children and adolescents used the text response “I can’t” when asked to respond to challenging frequency-based items [47]. We suggest that this pediatric population relates more to the concept of “can’t” because of issues such as limited strength, flexibility, or weightbearing-imposed restrictions by health care providers or caregivers or fear of injury or falling [47].

Floor/Ceiling Effects, Internal Consistency, and Test-Retest Reliability

There were no floor effects in either version of the pTESS, and very few participants scored at the ceiling. For the pTESS-Leg and -Arm, an average of 7% and 2% of participants scored at the ceiling, respectively. Although these ceiling effects are lower than those of some other pediatric self-reported measures [47], they are slightly higher than those reported for the adult TESS [12]. Children with a cancer diagnosis may underreport or deny difficulties with physical function [45]. Additionally, they may adjust to their physical limitations and tend to remain positive about their surgical outcomes and level of function [29], and this could account for the higher scores. Both versions of the pTESS demonstrated very high internal consistency, similar to that of the adult TESS [13]. Test-retest reliability was excellent for the pTESS-Leg, and again, was similar to that of the TESS [13]. The recommended sample size for sufficient power for the ICC in reliability testing was 35. Unfortunately, we were unable to achieve this sample for the pTESS-Arm (n = 13), so the reliability analysis was underpowered. Despite this limitation, the ICC was 0.86 and considered “good” by measurement standards [39,55].

Known-group, Construct, Criterion, Face, and Content Validity

The pTESS-Leg demonstrated known group validity (gait aids versus none), construct validity (time since chemotherapy, time since surgery, and VAS), and criterion validity (adult TESS). Saraiva et al. [53] found no correlation between the postoperative period and TESS score, whereas we found a moderate correlation between the postoperative time and pTESS-Leg scores. This incongruity may be explained by the difference in the followup periods of the cohorts. Saraiva et al. had a median followup period of 10 years, and our mean time since surgery was only 2.7 years. Some studies have suggested that the greatest improvement in quality of life and functional ability in children with bone sarcoma occurs after 2 years postoperatively [5,21]; therefore, greater variability was observed in our sample.

Because of the very small sample of participants with upper-extremity tumors, some of the validity analyses were underpowered. There was no statistical difference between upper-extremity participants using a brace (pTESS-Arm mean score = 73) and those not using one (mean = 83). Although there are no criteria for a clinically significant difference for the pTESS, a 10-point difference was demonstrated to be an important change for the adult TESS in older patients [30]. Therefore, the 10-point difference in means observed here could indicate a difference in function, which would require a larger sample to test. The pTESS-Arm did not correlate with the time since surgery or chemotherapy. Although there are no studies on which to base validity testing for the pTESS, it is possible that the sample (n = 27) was underpowered to show weaker correlations, considering that we only observed moderate correlations with the pTESS-Leg. The pTESS-Arm correlated strongly with the VAS rating and demonstrated strong criterion validity as correlated with the adult TESS, despite the small sample. Overall, the validity testing of the pTESS-Arm demonstrated reasonable validity, but it should be confirmed in followup studies using larger sample sizes.

In terms of face and content validity, the subgroup of older adolescents provided valuable feedback for the pTESS versus the TESS. We only gave the adult TESS to participants 16 years and older because the TESS includes items such as driving and working, both activities with an age requirement of 16 years or older. The majority (72% [leg tumor] and 62% [arm tumor]) of participants preferred the adult TESS to the pTESS overall. Many included comments such as “it applied to my age more,” “felt more for teenagers,” and “asked questions I can relate to more.” Research shows that the hardships of a cancer diagnosis in adolescence may result in the feeling of growing up faster and maturing earlier [34,49] and could explain the preference for the TESS. In contrast, some participants chose the pTESS and wrote comments such as “I don’t do adult activities at the moment,” “as a teenager, the questions suit me better,” and “there were more activities I do.” Although these participants were in the minority, it is important not to discount these responses. Since its development, the TESS has become the reference standard in patient-evaluated physical function for adult patients with sarcoma internationally. The strong correlation between the pTESS and adult TESS in adolescent participants is highly valuable and noteworthy information. Adolescent preferences for an adult versus pediatric questionnaire have not previously been investigated in this population. A more in-depth examination of the preferences of teenagers, and perhaps an examination of how to transition from the pTESS to the TESS, would be a valuable future endeavor.


The pTESS is the first self-reported measure of physical function developed specifically for children and adolescents with bone tumors. Patient interviews were used to determine the content of the measure to ensure that the tool is relevant to this population. This study demonstrates the absence of floor and/or ceiling effects, strong internal consistency, and good to excellent test-retest reliability for the pTESS. The pTESS-Leg demonstrated validity through multiple analyses. The pTESS-Arm, which was underpowered, demonstrated criterion validity and trends in construct validity. Based on these findings, the pTESS can be used for the cross-sectional evaluation of physical function in young patients with bone tumors. However, responsiveness or the ability to detect change over time needs to be evaluated and the pTESS-Arm should be further examined for validity in a larger sample.


We thank Aileen Davis for scientific advice; Derek Stephens for statistical analysis; Mark Clayer for his modified pediatric version of the TESS; Deirdre Tetzlaff, Krista Johnston, Manahil Naqvi, Jennifer Sargeant, Heather Cosgrove, Ellis Prather, and Colleen Fitzgerald for patient recruitment and logistic assistance; Renée Haldenby and Degen Southmayd for translations in Toronto; Janie Barry and Hugo Saint-Yves for translations in Montréal; and Bonnie Louie for preparation of patient packages.


1. Aksnes L, Bauer H, Jebsen N, Follerås G, Allert C, Haugen G, Hall K. Limb-sparing surgery preserves more function than ampution: a Scandinavian sarcoma group study of 118 patients. J Bone Joint Surg Br. 2008;90:786-794.
2. Beaton D, Bombardier C, Guillemin F, Bosi Ferraz M. Guidelines for the process of cross-cultural adaptation of self-report measures. Spine (Phila Pa 1976). 2000;25:3186-3191.
3. Bekkering W, Vliet Vlieland T, Fiocco M, Koopman H, Schoones J, Nelissen R, Taminiau A. Quality of life, functional ability and physical activity after different surgical interventions for bone cancer of the leg: a systematic review. Surg Oncol. 2012;21:e39-47.
4. Bekkering W, Vliet Vlieland T, Koopman H, Schaap G, Bart Schreuder H, Beishuizen A, Jutte P, Hoogerbrugge P, Anninga J, Nelissen R, Taminiau A. Functional ability and physical activity in children and young adults after limb-salvage or ablative surgery for lower extremity bone tumors. J Surg Oncol. 2011;103:276-282.
5. Bekkering W, Vliet Vlieland T, Koopman H, Schaap G, Beishuizen A, Anninga J, Wolterbeek R, Nelissen R, Taminiau A. A prospective study on quality of life and functional outcome in children and adolescents after malignant bone tumor surgery. Pediatr Blood Cancer. 2012;58:978-985.
6. Capanna R, Ruggieri P, Biagini R, Ferraro A, DeCristofaro R, McDonald D, Campanacci M. The effect of quadriceps excision on functional results after distal femoral resection and prosthetic replacement of bone tumors. Clin Orthop Relat Res. 1991;267:186-196.
7. Carty C, Dickinson I, Watts M, Crawford R, Steadman P. Impairment and disability following limb salvage procedures for bone sarcoma. Knee. 2009;16:405-408.
8. Clayer M, Davis AM. Can the Toronto Extremity Salvage Score produce reliable results when used online? Clin Orthop Relat Res. 2011;469:1750-1756.
9. Cohen J. Statistical Power Analysis for the Behavioural Sciences. 2nd ed. Hillsdale, NJ: Erlbaum; 1988.
10. Davidge K, Wunder J, Tomlinson G, Wong R, Lipa J, Davis AM. Function and health status outcomes following soft tissue reconstruction for limb preservation in extremity soft tissue sarcoma. Ann Surg Oncol. 2010;17:1052-1062.
11. Davis AM. Limb Salvage Procedures for Bone and Soft Tissue Sarcoma: Development of a Measure of Physical Function. Master's of Science thesis, Graduate Department of Community Health. Toronto: Graduate Department of Community Health, University of Toronto; 1994.
12. Davis AM, Bell RS, Badley EM, Yoshida K, Williams JI. Evaluating functional outcome in patients with lower extremity sarcoma. Clin Orthop Relat Res. 1999;358:90-100.
13. Davis AM, Wright JG, Williams JI, Bombardier C, Griffin AM, Bell RS. Development of a measure of physical function for patients with bone and soft tissue sarcoma. Qual Life Res. 1996;5:508-516.
14. Dillman DA. Mail and Internet Surveys: The Tailored Design Method. New York, NY: John Wiley; 2000.
15. Dillman DA. Mail and Other Self-administered Questionnaires. Toronto, Ontario, Canada: Academic Press; 1983.
16. Donner A, Eliasziw M. Sample size requirements for reliability studies. Stat Med. 1987;6:441-448.
17. Enneking WF. Modification of the system for functional evaluation of surgical management of Musculoskeletal Tumors. In: Enneking WF, ed. Limb Salvage in Musculoskeletal Oncology. New York, NY: Churchill-Livingstone; 1987:626.
18. Enneking WF, Dunham W, Gebhardt MC, Malawar M, Pritchard DJ. A system for the functional evaluation of reconstructive procedures after surgical treatment of tumors of the musculoskeletal system. Clin Orthop Relat Res. 1993;286:241-246.
19. Ferketich S. Focus on psychometrics: aspects of item analysis. Res Nurs Health. 1991;14:165-168.
20. Fletcher RH, Fletcher SW, Wagner EH. Clinical Epidemiology: The Essentials. 3rd ed. Philadelphia, PA: Lippincott Williams & Wilkins; 1988.
21. Frances J, Morris C, Arkader A, Nikolic Z, Healey J. What is quality of life in children with bone sarcoma. Clin Orthop Relat Res. 2007;459:34-39.
22. Furtado S, Grimer R, Cool P, Murray S, Briggs T, Fulton J, Grant K, Gerrand C. Physical functioning, pain and quality of life after amputation for musculoskeletal tumours. Bone Joint J. 2015;97:1284-1290.
23. Ginsberg JP, Rai S, Carlson C, Meadows A, Hinds P, Spearing E, Zhang L, Callaway L, Neel M, Rao BN, Marchese VG. A comparative analysis of functional outcomes in adolescents and young adults with lower-extremity bone sarcoma. Pediatr Blood Cancer. 2007;49:964-969.
24. Gradl G, Postl L, Lenze U, Stolberg-Stolberg J, Pohlig F, Rechl H, Schmitt-Sody M, von Eisenhart-Rothe R, Kirchhoff C. Long-term functional outcome and quality of life following rotationplasty for treatment of malignant tumors. BMC Musculoskelet Disord. 2015;16.
25. Gurney J, Swensen A, Bulterys M. Malignant bone tumors. In: Ries LAG, Smith M, Gurney J, et al, eds. Cancer Incidence and Survival among Children and Adolescents: United States SEER Program 1974-1995. NIH Pub. No. 99-4649. Bethesda, MD: National Cancer Institute, SEER Program; 1999:99-110.
26. Hoffman M, Mulrooney D, Steinberger J, Lee J, Baker K, Ness K. Deficits in physical function among young childhood cancer survivors. J Clin Oncol. 2013;31:2799-2805.
27. Hopyan S, Tan J, Graham K, Torode I. Function and upright time following limb salvage, ampuation and rotationplasty for pediatric sarcoma of bone. J Pediatr Orthop. 2006;26:405-408.
28. Hudson MM, Mertens A, Yasui Y, Hobbie W, Chen H, Gurney J, Yeazel M, Recklitis C, Marina N, Robison LR, Oeffinger K. Health status of adult long-term survivors of childhood cancer: a report from the Childhood Cancer Survivor Study. JAMA. 2003;290:1583-1592.
29. Hudson MM, Tyc VL, Cremer LK, Luo X, Li H, Rao BN, Meyer WH, Crom DB, Pratt CB. Patient satisfaction after limb-sparing surgery and amputation for pediatric malignant bone tumors. J Pediatr Oncol Nurs. 1998;15:60-29; discussion 70-71.
30. Jaglal S, Lakhani Z, Schatzker J. Reliability, validity, and responsiveness of the lower extremity measure for patients with a hip fracture. J Bone Joint Surg Am. 2000;82:955-962.
31. Jobe J. Cognitive psychology and self-reports: models and methods. Qual Life Res. 2003;12:219-227.
32. Keats MR, Courneya KS, Danielsen S, Whitsett SF. Leisure-time physical activity and psychosocial well-being in adolescents after cancer diagnosis. J Pediatr Oncol Nurs. 1999;16:180-188.
33. Klaassen R, Blanchette V, Barnard D, Wakefield C, Curtis C, Bradley C, Neufeld E, Buchanan G, Silva M, Chan A, Young NL. Validity, reliability, and responsiveness of a new measure of health-related quality of life in children with Immune Thrombocytopenic Purpura: the Kids' ITP Tools. J Pediatr. 2007;150:510-515.e511.
34. Lehmann V, Grönqvist H, Engvall G, Ander M, Tuinman M, Hagedoorn M, Sanderman R, Mattsson E, von Essen L. Negative and positive consequences of adolescent cancer 10 years after diagnosis: an interview-based longitudinal study in Sweden. Psychooncology. 2014;23:1229-1235.
35. Lollar DJ, Simeonsson RJ, Nanda U. Measures of outcomes for children and youth. Arch Phys Med Rehabil. 2000;81:S46-52.
36. Long A, Dixon P. Monitoring outcomes in routine practice: defining appropriate measurement criteria. J Eval Clin Pract. 1996;2:71-78.
37. Marchese VG, Ogle S, Womer RB, Dormans J, Ginsberg JP. An examination of outcome measures to assess functional mobility in childhood survivors of osteosarcoma. Pediatr Blood Cancer. 2004;42:41-45.
38. McDowell I, Jenkinson C. Development standards for health measures. J Health Serv Res Policy. 1996;1:238-246.
39. McHorney CA, Tarlov A. Individual-patient monitoring in clinical practice: are available health status surveys adequate? Qual Life Res. 1995;4:293-307.
40. Mellor D, Moore K. The use of Likert Scales with children. J Pediatr Psychol. 2014;39:369-379.
41. Meyers P, Schwartz C, Krailo M, Kleinerman E, Betcher D, Bernstein M, Conrad E, Ferguson W, Gebhardt MC, Goorin A, Harris M, Healey J, Huvos A, Link M, Montebello J, Nadel H, Nieder M, Sato J, Siegal G, Weiner M, Wells R, Wold L, Womer R, Grier H. Osteosarcoma: a randomized, prospective trial of the addition of ifosfamide and/or muramyl tripeptide to cisplatin, doxorubicin, and high-dose methotrexate. J Clin Oncol. 2005;23:2004-2011.
42. Nagarajan R, Clohisy DR, Neglia JP, Yasui Y, Mitby P, Sklar C, Finklestein J, Greenberg M, Reaman G, Zeltzer L, Robison LL. Function and quality-of-life of survivors of pelvic and lower extremity osteosarcoma and Ewing's sarcoma: the Childhood Cancer Survivor Study. Br J Cancer. 2004;91:1858-1865.
43. Ness K, Mertens A, Hudson MM, Wall M, Leisenring W, Oeffinger K, Sklar C, Robison LL, Gurney J. Limitations on physical performance and daily activities among long-term survivors of childhood cancer. Ann Intern Med. 2005;143:639-647.
44. Nunnally JC. Psychometric Theory. New York, NY: McGraw-Hill; 1978.
45. O'Leary T, Diller L, Recklitis C. The effects of resopnse bias on self-reported quality of life among childhood cancer survivors. Qual Life Res. 2007;16:1211-1220.
46. Pakulis PJ, Young NL, Davis AM. Evaluating physical function in an adolescent bone tumor population. Pediatr Blood Cancer. 2005;45:635-643.
47. Piscione PJ, Davis AM, Young NL. An examination of adolescent bone tumor patient responses on the Activities Scale for Kids (ASK). Phys Occup Ther Pediatr. 2014;32:213-228.
48. Portney LG, Watkins MP. Foundations of Clinical Research: Applications to Practice. 2nd ed. Upper Saddle River, NJ: Prentice Hall Health; 2000.
49. Quinn G, Huang IC, Murphy D, Zidonik-Eddelton K, Krull K. Missing content from health-related quality of life instruments: interviews with young adult survivors of childhood cancer. Qual Life Res. 2013;22:111-118.
50. Renard AJ, Veth RP, Schreuder HW, van Loon C, Koops HS, van Horn JR. Function and complications after ablative and limb-salvage therapy in lower extremity sarcoma of bone. J Surg Oncol. 2000;73:198-205.
51. Robert R, Ottaviani G, Huh W, Palla S, Jaffe N. Psychosocial and functional outcomes in long-term survivors of osteosarcoma: A comparison of limb-salvage surgery and amputation. Pediatr Blood Cancer. 2010;54:990-999.
52. Saeter G, Elomaa I, Wahlqvist Y, Alvegard TA, Wiebe T, Monge O, Forrestier E, Solheim OP. Prognostic factors in bone sarcomas. Acta Orthop Scand Suppl. 1997;273:156-160.
53. Saraiva D, de Camargo B, Davis AM. Cultural adaptation, translation and validation of a functional outcome questionnaire (TESS) to Portuguese with application to patients with lower extremity osteosarcoma. Pediatr Blood Cancer. 2008;50:1039-1042.
54. Stish B, Ahmed S, Rose P, Arndt C, Laack N. Patient-reported functional and quality of life outcomes in a large cohort of long-term survivors of Ewing sarcoma. Pediatr Blood Cancer. 2015;62:2189-2196.
55. Streiner DL, Norman GR. Health Measurement Scales: A Practical Guide to their Development and Use. 2nd ed. Oxford: Oxford University Press; 1995.
56. Sugiura H, Katagiri H, Yonekawa M, Sato K, Yamamura S, Iwata H. Walking ability and activities of daily living after limb salvage operations for malignant bone and soft-tissue tumors of the lower limbs. Arch Orthop Trauma Surg. 2001;121:131-134.
57. Young NL, Wright JG. Measuring pediatric physical function. J Pediatr Orthop. 1995;15:244-253.
58. Young NL, Yoshida KK, Williams JI, Bombardier C, Wright JG. The role of children in reporting their physical disability. Arch Phys Med Rehabil. 1995;76:913-918.

Supplemental Digital Content

© 2019 by the Association of Bone and Joint Surgeons