Identifying Clinically Meaningful Tools for Measuring Comfort Perception of Footwear : Medicine & Science in Sports & Exercise

Journal Logo


Identifying Clinically Meaningful Tools for Measuring Comfort Perception of Footwear


Author Information
Medicine & Science in Sports & Exercise 42(10):p 1966-1971, October 2010. | DOI: 10.1249/MSS.0b013e3181dbacc8


Ensuring that footwear is comfortable is a difficult task. Comfort has been described as an ever-changing individual perception influenced by mechanical, neurophysiological, and psychological factors (4,11,13,15). In footwear, several characteristics have been hypothesized to affect individuals' experience of comfort; these include fit, aesthetics, cushioning, support, foot sensitivity and mobility, leg and foot alignment, and the nature of the activity being performed (11,13,19,23). Previous research has found that footwear comfort affects muscle activity of the lower limb (11-13). It has also been hypothesized that footwear affects shock attenuation, dynamic stability, and performance (15,19).

Several authors have attempted to quantify the comfort of footwear (3,6,11,12). Using a 150-mm visual analog scale (VAS), Mundermann et al. (13) observed considerable intrasession variability in comfort ratings that improved as the number of sessions increased. After four sessions, average comfort ratings differed by less than 5 mm between sessions. Further, comfort measure repeatability seems to be influenced by the activity. Repeatability of comfort measures has been found to be lower in standing than in walking (11), and there are inconsistencies in measurement between walking and jogging depending on the type of scale used (3,11).

There are several possible factors that influence the reported variability in comfort measures, such as novel footwear, novel scales, several different types of measurement scales, and possible survey fatigue in repeated trials. Previous studies assessing the reliability of comfort measures have used shoes that are new to the subject. Thus, subjects needed to adapt not only to novel comfort scales but also to the new shoe (3,11,13) that may not fit as well or look as comfortable as their own (8,23,24). There are also differences in dimensions of comfort scales, with subjects' asked for comfort perceptions over several areas of the foot (e.g., heel, medial/lateral arch, and forefoot) or features of the shoe (e.g., length, volume, cushioning, and support). Potentially compounding the task are multiple repeated measurements in which subjects are asked to rate perception. With the increasing number of outcome measures and sessions being studied, the burden on the subject increases. Survey fatigue, an overexposure to the survey process, can also affect the responses and the reliability of the questionnaires (17).

The type of scale used to measure comfort might also influence the result. In comfort research, two types of scales have been identified: rating and ranking scales. Rating scales are the most widely used scales in survey research and are often used to show change within a single item, such as before and after an intervention, or when the value of an attribute or the amount of change is important (7,14,16,20). Ranking scales are used when the intervals separating each level of the scale have a relative sequence and are particularly useful when choosing between a limited number of items and when they are all undesirable or desirable (2,16). The reliability (stability on repeated measures) of the range of comfort rating scales requires further evaluation and is an aim of this study.

Measures of footwear comfort are of likely utility in the discovery and innovation of footwear, and as such, a critically essential step is to determine the minimal clinically important difference (MCID) (21). Pragmatically, the MCID is defined as the smallest difference in a score that patients perceive to be beneficial (10). In comfort, this would be a change in perception from uncomfortable to neutral or from neutral to comfortable. We are not aware of any studies that quantified the MCID of comfort rating scales/measures.

The purpose of our study was to examine the reliability of a range of footwear comfort scales (i.e., VAS, Likert scale, and ranking scale). To achieve this, we determined the most reliable measuring tool from the three scales as well as calculated the MCID of the most reliable rating scale. We also examined the dimensions of the comfort rating scales that most influenced the subject's decision-making process in ranking the comfort of shoes.


Owing to the nature of the scales being investigated, the study consisted of two experiments: one for evaluating VAS and Likert rating scale, whereas the other experiment assessed the ranking scale (Fig. 1). Both experiments similarly involved five sessions of repeated measures over five consecutive days. In coming to a decision to conduct two experiments, we also considered that subjects may become confused between the two types of scales (Likert and VAS vs ranking), and survey fatigue may also be increasingly a factor.

Study protocol.


Twenty subjects were recruited; 10 for each experiment were consecutively allocated (Fig. 1). Subjects were recruited through advertisements in a local fitness center using the following inclusion criteria: aged 18-40 yr, an ability to jog at a consistent comfortable pace for a minimum of 2 min, and availability to attend for five consecutive days of the study. Thirty individuals responded, and then exclusion criteria were applied resulting in 10 being unsuitable for the experiment. The exclusion criteria were as follows: injured at the time of the study, had previous surgery on their lower limb, or had a history of neurological, sensory, or orthopedic conditions. The demographics of the two groups are shown in Tables 1 and 2. All 20 completed the study. The study was approved by the ethics committee at the Australian Institute of Sport and by the University of Queensland medical research ethics committee. Before commencement, subjects were familiarized with the scales they would be using, and written informed consent was obtained.

Subjects involved in experiment 1 (rating scales).
Subjects involved in experiment 2 (ranking scale).


All subjects, regardless of experiment group, were exposed to the same five conditions: their usual jogging shoe and four different commercially available prefabricated orthoses. The shoe (no insert) served as the standardized condition that allowed us to test the reliability of the measure, that is, any variability in rating of perceived comfort, cushioning, and support was less likely due to the shoe and more likely a function of the stability of the rating scale. The orthoses were randomly inserted in place of the manufacturer's sock-liner. Subjects were blinded to which of these orthoses were inserted into their shoe, and no orthosis was used twice within a single session. For the purposes of this study, the orthoses trials served to distract or break the focus of the subject away from the shoe-only condition.

Experiment 1

Rating scale measures.

Six horizontal 100-mm VAS and seven-point Likert scales (see Figures, Supplemental Digital Contents 1 and 2, which illustrate the VAS and Likert scales completed by subjects in experiment 1, and were used to obtain measures of overall comfort; cushioning of the forefoot, arch, and heel; and support of the arch and heel. The VAS was anchored with the terms "not comfortable at all" to "most comfortable imaginable," whereas the Likert ranged from "very uncomfortable" to "very comfortable" to ensure balance within the scale. Completed scales from previous trials were not permitted to be viewed when subjects were making new comfort ratings.

Protocol for experiment 1.

Experiment 1 comprised five sessions, each consisting of four trials. A "trial" involved a 2-min walk and a 2-min jog with a shoe first and then an insert second (Fig. 1). Trials were conducted on a treadmill at a self-selected "comfortable" pace established on the first trial of the first sessions and held constant throughout the remainder of the experiment. At the conclusion of each shoe trial, subjects rated their comfort, cushioning, and support on the VAS and Likert scale. Thus, in each session of experiment 1, comfort ratings were repeated four times for walking and four times for jogging.

At the conclusion of the last trial of the last session, subjects were asked, based on their experiences of comfort, what change in millimeters on the VAS would represent a meaningful change in comfort perception.

Experiment 2

Ranking scale measures.

The ranking scale consisted of a vertically oriented list of the five, which were randomly ordered for each subject. Subjects were asked rank the overall comfort only of conditions from most comfortable (rank 1) to least comfortable (rank 5). Subjects were permitted to make notes as the session progressed but did not view rankings from previous sessions. Only the ranked position of relative comfort of the shoe was used in data analysis.

Protocol for experiment 2.

In each session, subjects walked at a self-selected speed in each condition before ranking them in order of comfort. This process was then repeated for jogging. Subjects walked/jogged in each condition for as long as required for them to be satisfied with the rankings and were also permitted to retrial any condition if needed.

At the conclusion of the last session, subjects were asked for their perception of the major features of the footwear that influenced their decision of overall comfort.

Statistical Analysis

All analyses were completed on the software package SPSS (version 16, Chicago, IL).

Experiment 1: Rating Scales


We used mixed linear models to assess the stability of the comfort rating scales. Main (fixed) effects were session (five daily sessions), trial (four trials per session), and gait (walking or jogging). Subject variability was accounted for in this modeling by inclusion of subjects as a random factor. Significant main effects were followed up with Bonferroni-corrected tests of simple effects. Mean differences and their 95% confidence intervals (CI), as well as the standardized mean difference (i.e., mean difference divided by the pooled SD, also known as effect size (ES)), are reported as the point estimates of effect for pairwise comparisons. CI of mean differences that contained a "0" indicated a null effect, and ES levels are referenced to Hopkins's system (9) as trivial (<0.2), small (0.2-0.6), moderate (0.61-1.2), and large (>1.2).


The MCID was calculated in two ways: the SEM was used as the data-derived approach because it has been found to remain relatively stable across different cohorts (5,25). SEM was calculated (SD × √ 1 − intraclass correlation coefficient) for every significant pairwise comparison within the VAS model. The largest value was used as the SEM, so that for a change to be meaningful, it had to surpass all possible error.

A subject-derived approach was also adopted by asking subjects to nominate the change on the VAS (mm) that would represent a meaningful change in comfort for them. This approach has been shown to be the single best measure of change from an individual's perspective (18,22). The mean of these patient-nominated values then became the subject-derived MCID (22).

Experiment 2: Ranking Scale


Nonparametric tests were used to analyze the reliability of the ranking scale. Friedman ANOVA was used to assess the reliability between sessions, and post hoc Wilcoxon signed-rank was used to determine differences between specific sessions and gait. Results are presented at the test statistics (χ2 for Friedman ANOVA and t for Wilcoxon signed-rank), the significance (P value), and level of effect (r).

Evaluation of Comfort Dimensions

The responses given by subjects in experiment 2 were examined for common themes and formed the basis of our evaluation. Guided by these responses, we conducted stepwise regressions and Pearson correlations between overall comfort and the remaining dimensions to determine whether the data supported subjects' responses. To determine the strength of these relationships, we referenced to Hopkins's (9) scale of small (0.1), moderate (0.3), large (0.5), very large (0.7), and almost perfect (0.9).


Experiment 1: Rating Scales


For the main effect of trial, the VAS was found to be stable across all dimensions. This differed from the Likert scale where a significant difference was noted between trials in overall comfort. The VAS was also found to be stable between all sessions for the dimensions of forefoot cushioning and arch support as well as gait for forefoot, arch, and heel cushioning and heel and arch support. Significant differences were found between sessions in overall comfort, arch cushioning, and heel cushioning and support ratings as well as for ratings of overall comfort between walking and jogging (Table 3). Similarly, the Likert scale was found to be stable between all sessions for forefoot cushioning and overall comfort ratings, with significant differences detected for the remaining dimensions. In addition, heel support was the only dimension to be stable between walking and jogging when using the Likert scale (Table 4). As such, the Likert scale was less stable than the VAS and was removed from further analysis.

VAS linear mixed model.
Likert scale mixed linear model.

Pairwise comparisons found that forefoot cushioning and arch support VAS measures were stable from the first session and from the second session for overall comfort and arch cushioning. Measures of both heel dimensions were stable from the third session (see Table, Supplemental Digital Content 3, which illustrates tests of simple effects resulting from significant main effects, Overall comfort measures obtained from session 1 differed only from those obtained from session 4 (2.92 mm, range = 0.22-5.62 mm), whereas session 1 measures of arch cushioning differed from those obtained from session 3 (4.39 mm, range = 1.74-7.55 mm), session 4 (4.9 mm, range = 1.74-8.06 mm), and session 5 (3.92 mm, range = 0.76-7.08 mm). However, in point estimates of effect, all of these differences were small. Heel cushioning and support measures obtained from sessions 1 and 2 both differed from those obtained from sessions 3, 4, and 5 (see Table, Supplemental Digital Content 3, which illustrates significant pairwise comparisons,, and although the point estimates of effect are relatively larger than all other dimensions, as they are <0.6, they are still small.


Using the data-derived method involved 16 significant pairwise comparisons with SEM ranging from 7.08 to 9.59 mm (Table 5). On the basis of these data, 9.59 mm represents the change required for a clinically meaningful change in comfort. Asking subjects for meaningful change resulted in three subjects nominating 5 mm, one subject nominating 7 mm, four nominating 10 mm, and the remaining two nominating 15 and 25 mm. The mean of these values, representing the anchor-based MCID, is 10.2 mm.

SEM and ICC for significant pairwise comparisons.

Experiment 2: Ranking Scales

Friedman ANOVA found no differences between sessions (χ2(4) = 1.138, P = 0.888). Post hoc Wilcoxon signed-rank tests showed no difference between walking and jogging (t = 124.5, P = 0.446, r = 0.01).

Evaluation of Comfort Scale Dimensions

Subjects' responses for the major influences in their overall comfort judgments were categorized by theme (Table 6). In short, all subjects responded that their overall comfort was influenced by arch comfort. One subject also nominated the forefoot as an important factor.

Subjects' responses to "what was most important influence in your decision of overall comfort" (n = 10).

Linear stepwise regression found a combination of heel cushioning and support, forefoot cushioning, and arch cushioning to explain 69% of the overall comfort model. Heel cushioning was found to have the highest correlation with overall comfort (r = 0.726) and an adjusted r2 of 0.526. Heel support was also very highly correlated with overall comfort (r = 0.722) and slightly increased the adjusted r2 value of the linear regression model (0.539). Arch support had the lowest correlation with overall comfort and, through the stepwise process, was excluded from the model.


The purpose of this investigation was to determine the most reliable measures of footwear comfort from three commonly used scales. We found the ranking scale to be the most reliable because no differences in relative shoe comfort were found between sessions or gait. Of the rating scales, a 100-mm VAS was more stable than a seven-point Likert scale on measuring the same outcomes. Because of this finding, the Likert scale was excluded from further analysis.

VAS measures of overall comfort and arch cushioning required a minimum of two sessions to produce reliable measures. Heel cushioning and support measures were reliable from the third session. These findings are different from those of Mundermann et al. (12), who recommended four to six consecutive sessions for reliable comfort measures. Their methodology included comfort measures obtained from novel inserts; thus, it cannot be ascertained if the additional time requirement was due to the scale or adjusting to the inserts.

In using a VAS, an MCID is an important indicator of meaningful change. Subject-derived and data-derived methods of establishing MCID produced very similar results of 10.2 and 9.59 mm, respectively. We believe that either of these amounts is a valid indicator of meaningful change. In applying these amounts to pairwise comparisons, not only were the ES obtained from statistically significant differences all found to be small (<0.6, >0.2), they were all less than both nominated MCID. Therefore, the statistically significant differences found in this study are not clinically relevant.

The second aim of the investigation was to determine which dimension of footwear comfort influenced subjects' perceptions of overall comfort. When questioned, all subjects in our investigation responded that the most important influence on overall comfort was arch comfort. Forefoot comfort was also indicated by one subject. In a study of 20 Hong Kong Chinese women, Au and Goonetilleke (1) also found that the forefoot and arch contributed to comfort and that the difference between an uncomfortable and comfortable forefoot and arch could be clearly indentified. In contrast, the present investigation found that the heel was not identified as an important comfort consideration despite its contribution to the overall comfort model. Au and Goonetilleke (1) found that rearfoot comfort measures obtained from comfortable and uncomfortable shoes were not significantly different. This suggests that individuals prioritize other areas of the foot and, as demonstrated in regression modeling, heel cushioning and support measures are encompassed in measures of overall comfort.

It must be acknowledged that the features identified by this investigation are not the only factors that can affect shoe comfort. Shoe fit (8,11,24) and aesthetics (1,23) have also been identified as important factors, particularly at the point of sale, and require individuals to have different shoes. Arch height, the length of the first toe, breadth of the ball of the foot, and instep circumference differ between men and women (24). Small differences have been found in hallux height and in the angle of the metatarsal-phalangeal joint between Japanese-Korean and North American cohorts (8). Aesthetics is a dominant factor in the choice of shoe for both normal (1) and pathological (23) cohorts. By using the subjects' own shoe, rather than a standard shoe, we were able to build on these findings by suggesting specific components of the shoe that influence the comfort construct. Similar findings occurred between the work of Au and Goonetilleke (1) involving a Hong Kong Chinese female cohort wearing dress shoes and the present study with a cohort of both genders and a variety of ethnic backgrounds wearing sports shoes. Because of this, we are confident that, although different people will require different shoe designs (either for cosmetic or for fit reasons), ensuring arch and forefoot comfort is essential.

This is the first study to compare comfort measure scales, to suggest important dimensions to use when measuring footwear comfort, and to propose an MCID for a comfort VAS. Establishing these findings involved asking subjects up to 480 comfort questions for five consecutive days. We acknowledge that subjects could develop fatigue as the study progressed. However, in such case, we would expect reliability of responses to decrease in latter sessions. We found reliability was constant during the last 3 days for all measures and conclude that our subjects did not experience survey fatigue.


A ranking scale measuring overall comfort produced the most reliable footwear comfort measures. This was followed by a 100-mm VAS measuring overall comfort, forefoot arch and heel cushioning, and arch and heel support. A seven-point Likert scale produced the least reliable results. Using a ranking scale provides information regarding the relative comfort of footwear and is useful with two or more items. A VAS is required if information is needed regarding the amount or change in comfort. If a VAS were to be used, measures of overall comfort, forefoot and arch cushioning, and arch support obtained during two sessions will provide reliable information. In doing so, 10.2 and 9.59 mm can be used to identify MCID in footwear comfort.

Financial support for this research was received from the Australian Research Council (Australian Research Council Linkage Project grant LP0668233). K.M. is supported by an Australian Research Council Australian Postgraduate Award Industry. Vasyli International provided the inserts used in this study.

The authors report no conflicts of interest.

The results of this study do not constitute endorsement by the American College of Sports Medicine.

The authors thank Prof. Michael Martin and Dr. Steven Stern (School of Finance and Applied Statistics, the Australian National University, Canberra, Australia) for their statistical guidance.


1. Au EY, Goonetilleke RS. A qualitative study on the comfort and fit of ladies' dress shoes. Appl Ergon. 2007;38(6):687-96.
2. Bradburn N. Asking Questions: The Definitive Guide to Questionnaire Design-For Market Research, Political Polls, and Social and Health Questionnaires. San Francisco (CA): Jossey-Bass; 2004. p. 200-30.
3. Chen H, Nigg BM, De Koning J. Relationship between plantar pressure distribution under the foot and insole comfort. Clin Biomech (Bristol, Avon). 1994;9:335-41.
4. Chen H, Nigg BM, Hulliger M, de Koning J. Influence of sensory input on plantar pressure distribution. Clin Biomech (Bristol, Avon). 1995;10(5):271-4.
5. Crosby RD, Kolotkin RL, Williams GR. Defining clinically meaningful change in health-related quality of life. J Clin Epidemiol. 2003;56(5):395-407.
6. Finestone A, Novack V, Farfel A, Berg A, Amir H, Milgrom C. A prospective study of the effect of foot orthoses composition and fabrication on comfort and the incidence of overuse injuries. Foot Ankle Int. 2004;25(7):462-6.
7. Guyatt GH, Townsend M, Berman LB, Keller JL. A comparison of Likert and visual analogue scales for measuring change in function. J Chron Dis. 1987;40(12):1129-33.
8. Hawes MR, Sovak D, Miyashita M, Kang SJ, Yoshihuku Y, Tanaka S. Ethnic differences in forefoot shape and the determination of shoe comfort. Ergonomics. 1994;37(1):187-96.
9. Hopkins W. A New View of Statistics [Internet]. 2007 [cited 2009 May 20]. Available from
10. Jaeschke R, Singer J, Guyatt GH. Measurement of health status. Ascertaining the minimal clinically important difference. Control Clin Trials. 1989;10(4):407-15.
11. Miller JE, Nigg BM, Liu W, Stefanyshyn DJ, Nurse MA. Influence of foot, leg and shoe characteristics on subjective comfort. Foot Ankle Int. 2000;21(9):759-67.
12. Mundermann A, Nigg BM, Stefanyshyn DJ, Humble RN. Development of a reliable method to assess footwear comfort during running. Gait Posture. 2002;16(1):38-45.
13. Mundermann A, Stefanyshyn DJ, Nigg BM. Relationship between footwear comfort of shoe inserts and anthropometric and sensory factors. Med Sci Sports Exerc. 2001;33(11):1939-45.
14. Neuman W. Social Research Methods: Qualitative and Quantitative Approaches. 6th ed. Boston (MA): Allyn & Bacon; 2005. p. 400-15.
15. Nigg B, Nurse M, Stefanyshyn D. Shoe inserts and orthotics for sport and physical activities. Med Sci Sports Exerc. 1999;31(7 suppl):S421-8.
16. Oppenheim AN. Questionnaire Design, Interviewing and Attitude Measurement. London (UK): Continuum; 1992. p. 303.
17. Porter SR, Whitcomb ME, Weitzer WH. Multiple surveys of students and survey fatigue. New Dir Inst Res. 2004;121:63-73.
18. Redelmeier DA, Lorig K. Assessing the clinical importance of symptomatic improvements. An illustration in rheumatology. Arch Intern Med. 1993;153(11):1337-42.
19. Reinschmidt C, Nigg BM. Current issues in the design of running and court shoes. Sportverletz Sportschaden. 2000;14(3):71-81.
20. van Laerhoven H, van der Zaag-Loonen HJ, Derkx BHF. A comparison of Likert scale and visual analogue scales as response options in children's questionnaires. Acta Paediatr. 2004;93:830-5.
21. Wells G, Anderson J, Beaton D, et al. Minimal clinically important difference module: summary, recommendations, and research agenda. J Rheumatol. 2001;28(2):452-4.
22. Wells GA, Tugwell P, Kraag GR, Baker PR, Groh J, Redelmeier DA. Minimum important difference between patients with rheumatoid arthritis: the patient's perspective. J Rheumatol. 1993;20(3):557-60.
23. Williams AE, Nester CJ. Patient perceptions of stock footwear design features. Prosthet Orthot Int. 2006;30(1):61-71.
24. Wunderlich RE, Cavanagh PR. Gender differences in adult foot shape: implications for shoe design. Med Sci Sports Exerc. 2001;33(4):605-11.
25. Wyrwich KW, Tierney WM, Wolinsky FD. Further evidence supporting an SEM-based criterion for identifying meaningful intra-individual changes in health-related quality of life. J Clin Epidemiol. 1999;52(9):861-73.


Supplemental Digital Content

©2010The American College of Sports Medicine