Many clinical studies of upper limb rehabilitation after stroke use the ARAT as an outcome. It measures upper limb function by scoring the ability of a participant to complete a range of functional tasks . The scale consists of 19 items rated on a four-point ordinal scale ranging from zero (cannot perform any part of task) to three (performs task normally). The overall total has a range of 0–57, but the items can be reported as four subscales (grasp, grip, pinch, gross movement).
The ARAT has generally good psychometric properties , but the extent of floor and ceiling effects is still unclear. A floor effect is when many participants obtain the minimum possible score, whereas a ceiling effect is when many participants obtain the maximum score. The existence of these raises doubt whether the scale really covers the full range of ability being measured. A review  found that the percentage of participants with the highest or lowest values of the ARAT total score varied considerably across studies, with many reporting percentages above 15%. At this level, lower reliability and responsiveness of the scale are considered . The extent of these effects is likely to vary with the characteristics of the assessed stroke participants, as the distribution of scores shifts, indicating more or less functional limitation. For instance, in the VECTORS study , the median ARAT total score was 51.5 (out of 57), whereas, in a study by Hsueh and Hsieh , the median ARAT total score was 0: not surprisingly, the first study reported a high ceiling effect (41%), while the second reported a high floor effect (52%).
Although the ARAT can be reported as subscales, not all studies do: the psychometric properties have not been validated for the subscales . There is less evidence on whether floor and ceiling effects occur when using the subscales, but Hsueh and Hsieh  reported substantial floor effects on all subscales and some evidence of ceiling effects. The VECTORS study did not report these effects for subscales, but since the median values for grasp, grip and gross movement were the maximum possible, substantial ceiling effects are likely. Besides the consideration of whether the scale covers the full range of abilities, another issue is how to analyse a measurement that potentially has a substantial proportion of data values at the minimum or maximum value.
Methods, results and discussion
The ARAT was reported in the RATULS trial . This compared robot-assisted training with enhanced upper limb therapy and usual care for 770 stroke patients with moderate or severe upper limb functional limitation (baseline ARAT total <39). The primary outcome was whether a participant had achieved an improvement over time of a given size in the ARAT total, but secondary outcomes included the total and ARAT subscales. The median ARAT total was 3 at baseline, so this patient group started with predominantly low scores and, therefore, considerable arm function limitations. The distribution of the ARAT total at baseline and three months shows substantial floor effects (Fig. 1a). Given this feature, we considered how best to compare scores between randomisation groups: this included both descriptive statistics and inferential approaches. Since we wished to adjust any comparison at three months for time since stroke, study centre and baseline ARAT total, some form of multivariate regression was necessary. The analysis could have used either linear regression comparing means or quantile regression comparing medians. In our case, the distribution of the ARAT total was clearly positively skewed at both time points, so comparing means might not seem the obvious approach. However, the requirements for the use of multiple regression techniques look at the shape of the distribution after adjustment for baseline values. This produced normal errors when comparing means at three months after adjustment, and therefore, this was appropriate for the analysis of the total score.
Where ARAT subscales were reported, they have usually been summarised as either a mean [4,8–11] or median [5,12]: ANOVA and Kruskal–Wallis tests have been used but the distribution shapes that led to these choices were not mentioned. A statistical analysis plan must consider the shape of the data distribution to make appropriate choices. The distribution of the ARAT subscales at three months in RATULS were ‘U-shaped’ rather than the positive skew seen in the total score (Fig. 1b–e), meaning that participants tended to score zero or full marks on each subscale, and few scored the values in-between. This is shown by the substantial floor, and to a lesser extent, ceiling effects (Table 1). Therefore comparing means or medians was not appropriate: neither measure gives a typical value. After consideration of analysis options, we chose to use a simple approach by dichotomising the subscales to give a binary measure and then using logistic regression to compare groups. The split was chosen to be between participants who could complete at least one task of the subscale (scored 2 or 3 on at least one item of that subscale, indicating they completed the task but possibly taking a very long time) and those that could not (scored 0 or 1 on all items of the subscale, indicating that there was no movement or just a partial performance of the task) . More sophisticated analysis techniques could have been chosen [14,15], but these would have made interpretation of the results harder for nonstatisticians.
Table 1 -
Floor and ceiling effects
for RATULS and BOTULS at baseline and 3 months
|Grasp (0–18), n (%)
| Floor effect
| Ceiling effect
|Grip (0–12), n (%)
| Floor effect
| Ceiling effect
|Pinch (0–18), n (%)
| Floor effect
| Ceiling effect
|Gross (0–9), n (%)
| Floor effect
| Ceiling effect
an = 668 due to missing data.
Although floor or ceiling effects have been reported, other studies have not reported a ‘U-shaped’ distribution of the ARAT subscales with floor and ceiling effects present simultaneously, so we looked at the distribution of the ARAT subscale in another trial of 333 patients evaluating treatment of upper limb spasticity due to stroke with botulinum toxin type A (BOTULS)  (Fig. 2, Table 1). In BOTULS, participants also exhibited a considerable lack of arm function at baseline (median ARAT total score = 3). The distributions of the subscales were not ‘“U shaped’ at 3 months, but the distributions were problematic, as they were highly positively skewed with a median of zero for three subscales (i.e. a substantial floor effect). Comparisons of either means or medians across subgroups would be problematic, so a similar approach dichotomising the subscales, as used in RATULS, would be more appropriate.
When analysing the ARAT total and subscales, care must be taken to check the shape of the data distributions and choose the most appropriate descriptive and inferential statistical techniques. If the data has a ‘U-shaped’ distribution, an alternative to the estimation of means or medians is needed. This should also be considered for heavily skewed distributions, which may result from substantial floor or ceiling effects. Inappropriate analyses can lead to misleading conclusions.
The views and opinions expressed here are those of the authors and do not necessarily reflect those of the HTA programme, NIHR, the UK National Health Service (NHS), or UK Department of Health. We would like to thank participants, local investigators and site staff, co-ordinating centre staff and members of the trial oversight committees of RATULS and BoTULS for their contribution to these research projects.
The employing institutions of all authors received funds from National Institutes of Health Research (NIHR) Health Technology Assessment Programme (HTA) in order for the main RATULS and BoTULS trials to be undertaken. The RATULS and BoTULS trial results are previously published, and part of this manuscript has been presented as a poster at the European Stroke Organisation Conference in May 2019.
Conflicts of interest
There are no conflicts of interest.
1. Lyle RC. A performance test for assessment of upper limb function in physical rehabilitation treatment and research. Int J Rehabil Res. 1981; 4:483–492.
2. Pike S, Lannin NA, Wales K, Cusick A. A systematic review of the psychometric properties of the Action Research Arm Test
in neurorehabilitation. Aust Occup Ther J. 2018; 65:449–471.
3. Terwee CB, Bot SD, de Boer MR, van der Windt DA, Knol DL, Dekker J, et al. Quality criteria were proposed for measurement properties of health status questionnaires. J Clin Epidemiol. 2007; 60:34–42.
4. Dromerick AW, Lang CE, Birkenmeier RL, Wagner JM, Miller JP, Videen TO, et al. Very early constraint-induced movement during stroke
rehabilitation (VECTORS): a single-center RCT. Neurology. 2009; 73:195–201.
5. Hsueh IP, Hsieh CL. Responsiveness of two upper extremity function instruments for stroke
inpatients receiving rehabilitation. Clin Rehabil. 2002; 16:617–624.
6. Koh CL, Hsueh IP, Wang WC, Sheu CF, Yu TY, Wang CH, Hsieh CL. Validation of the action research arm test
using item response theory in patients after stroke
. J Rehabil Med. 2006; 38:375–380.
7. Rodgers H, Bosomworth H, Krebs HI, van Wijck F, Howel D, Wilson N, et al. Robot assisted training for the upper limb after stroke
(RATULS): a multicentre randomised controlled trial. Lancet. 2019; 394:51–62.
8. Khatoon I, Hamdani N, Noohu M. A comparative study on the effect of types of focus of attention on upper limb function training in subjects with stroke
. J Physical Med Rehabilitation Sci. 2014; 10:134–139.
9. Arya KN, Verma R, Garg RK, Sharma VP, Agarwal M, Aggarwal GG. Meaningful task-specific training (MTST) for stroke
rehabilitation: a randomized controlled trial. Top Stroke
Rehabil. 2012; 19:193–211.
10. Nagapattinam S. Effect of task specific mirror therapy with functional electrical stimulation on upper limb function for subacute hemiplegia. International J Physiotherapy. 2015; 2:840–849.
11. Morris JH, van Wijck F, Joice S, Ogston SA, Cole I, MacWalter RS. A comparison of bilateral and unilateral upper-limb task training in early poststroke rehabilitation: a randomized controlled trial. Arch Phys Med Rehabil. 2008; 89:1237–1245.
12. Powell J, Pandyan AD, Granat M, Cameron M, Stott DJ. Electrical stimulation of wrist extensors in poststroke hemiplegia. Stroke
. 1999; 30:1384–1389.
13. Rodgers H, Bosomworth H, van Wijck F, Krebs HI, Shaw L. Usual care: the big but unmanaged problem of rehabilitation evidence - Authors’ reply. Lancet. 2020; 395:337–338.
14. Liang Y, He C, Sun D, Schootman M. Modeling bounded outcome scores using the binomial-logit-normal distribution. Chilean J Statistics. 2014; 5:3–14.
15. Molas M, Lesaffre E. A comparison of three random effects approaches to analyze repeated bounded outcome scores with an application in a stroke
revalidation study. Stat Med. 2008; 27:6612–6633.
16. Shaw LC, Price CI, van Wijck FM, Shackley P, Steen N, Barnes MP, et al. Botulinum toxin for the upper limb after stroke
(BoTULS) trial: effect on impairment, activity limitation, and pain. Stroke
. 2011; 42:1371–1379.