There are various methodological challenges with investigating performance effects of treatments in elite athletes. One issue that researchers often experience is statistical: small samples of elite athletes often mean that the study is likely to quantify only large effects with adequate precision (3). Other issues may occur with coaches being unwilling to randomize elite athletes to control and treatment groups, to blinding athletes to treatments, and to standardizing training programs for research purposes. Subelite athletes are therefore usually more accessible and easier to study, yet the results of studies with these groups may not generalize to elite athletes. Another problem is that researchers generally choose to investigate effects of treatments on performance in tests in the laboratory or training setting rather than in competitions, because competitions are considered too important for experiments. Nevertheless, a study of performance enhancement in competition could be definitive: no measure of performance is more valid for the athlete than performance in the actual event.
The purpose of this project was to develop a practical design and an analytical approach for investigation of performance effects in competitions when treatments are applied to a squad of elite athletes. We used US swimming competition data because of the large number of competitive performances and the availability of these data on the Internet. We downloaded data for one season and assessed whether effects of interventions on competition performance could be estimated with adequate precision. The problem of missing values that are inevitable with competition data was addressed by analyzing the data with mixed modeling (10).
US long-course performance times were downloaded from USAswimming.org for the period September 2008 through August 2009 (post–Beijing Olympics through Rome World Championships and US Open Championships). Each performance time came with a Hy-Tek point score (ranging from 1 to 1300). Hy-Tek points for swimmers participating at the 2009 USA Swimming Nationals were 893 ± 80 (mean ± SD; range, 549 to 1112). For analyses focused on elite swim squads, we therefore selected data for swimmers who achieved >900 Hy-Tek points and who were in squads of more than two swimmers at these Nationals. Best point scores at the Nationals were used to select each swimmer’s best event. Only competitions with >14 best-event swims were included. This selection process resulted in a total of 368 best-event swims in seven competitions by 148 athletes in 19 squads.
We assumed that an intervention affecting performance in all strokes and distances had been applied to swimmers of the squad Ford for the 2009 Santa Clara International Competition. Ford was chosen solely because it was one of the larger squads; Santa Clara was chosen because many teams were present at this competition. We then estimated the intervention effect in a design equivalent to a parallel-group controlled trial with athletes of other squads being the control group. An overview of the number of best-event swims by competition is shown in Table 1. The number of selected swims by event ranged from 1 (50-m breaststroke and 50-m butterfly) to 48 (200-m individual medley).
Further analyses were performed after changing the amount of data with other selection criteria: excluding competitions after the experimental Santa Clara competition, including the second-best event for each selected swimmer, and/or including performances of all swimmers who had competed at the USA Swimming Nationals (i.e., any Hy-Tek score). Exclusion of competitions after Santa Clara was applied to investigate the possibility of making conclusions about the treatment effect immediately after the competition where it was trialed; inclusion of performances of second-best events and/or inclusion of performances of all swimmers were applied to investigate whether an increase in the amount of data would enhance precision of estimates. Data selections resulted in totals of 154 to 779 swims in 4 to 16 competitions by 109 to 277 athletes in 15 to 20 squads.
Competition times were analyzed with mixed general linear models using Proc Mixed in the Statistical Analysis System (version 9.2, SAS Institute, Cary, NC). The models were extensions of those used for investigating reliability of competition performance (e.g., Ref. ), in which fixed effects estimate mean performance times at each competition and random effects estimate between-athlete differences in performance and within-athlete variability in performance from one competition to the next. The fixed effects included as part of the usual reliability model estimated mean performances of each sex at each competition in each event (stroke and distance). An additional fixed effect was included for estimating mean performances of squads. The effect of the imaginary intervention was estimated as a fixed effect represented by a dummy variable coded 1 for the performances of athletes in the experimental squad (Ford) and competition (Santa Clara) and coded 0 for all other performances. The random effects included as part of the reliability model were the athlete identity (to account for different abilities of athletes) and the residual (the within-athlete variability). An additional random effect was included to account for the clustering of athletes in squads. Preliminary analyses with date of event included as a numeric fixed effect for a linear or quadratic trend analysis proved inappropriate because of the rapid performance improvement in the last competitions of the season (USA Nationals, World Championships, and US Open).
For comparing the resulting precision of estimates with that in other recent research (published after the year 2000), we used Google Scholar to search for studies of interventions in competitive senior swimmers where the precision of estimate was either stated or easily determined via exact P values. Spreadsheets were used to derive 90% confidence limits (CL) (2,4).
Table 2 shows the performance effects of the hypothetical intervention applied to Ford for the Santa Clara Competition. The outcomes in this table relevant to the present study are the CL (uncertainties) rather than the mean effects. For the first data-selection criteria in the table, the intervention would have resulted in uncertainty of ±0.8%. Analysis of data including performances in second-best events also resulted in ±0.8% uncertainty. Applying other data-selection criteria resulted in more uncertainty; for example, if researchers had wanted to make a conclusion about the treatment effect immediately after Santa Clara, the uncertainty would have been approximately ±1.9%. Not shown in the table is the precision when the same hypothetical intervention was applied to an additional squad (Auburn Tigers) at another competition (Charlotte UltraSwim): ±0.5% for best-event swims by the elite athletes at all competitions.
Table 3 shows the estimates of performance effects of interventions in recent studies of senior competitive swimmers. The observed precision of estimate with the present competition-based design (±0.8%) is similar or better compared with that in other studies.
The model provided estimates of other effects, as follows: within-athlete SD in competition performance ranged between 1.1% and 1.5% (range in 90% CL, ×/÷1.1%–1.2%), between-athlete SD ranged between 2.3% and 2.8% (×/÷1.1%–1.2%). Ford’s overall mean performance was better than that of all the other squads by 2.4% (90% CL, ±1.3%) for analysis of best-event performances of all swimmers at all competitions; comparisons of Ford with the other squads were unclear for all other data-selection criteria. The performance difference between males and females was largest in 200-m backstroke (mean difference, 12.8%; 90% CL, ±5.1%) and smallest in 400-m freestyle (6.4%, ±2.8%). Mean performance at the FINA World Championships improved by 0.8% (±0.5%) from the USA Nationals and by 3.7% (±0.5%) from the Santa Clara swim meet. The differences between best event and second-best event were 1.5% (±0.3%) and 1.2% (±0.3%) for analysis of data including all competitions and competitions to Santa Clara, respectively.
We have successfully developed a powerful research design for measuring effects of interventions on competition performance in squads of elite swimmers. It is essentially a parallel-group design with athletes in other squads forming the control group for comparison with the squad under evaluation. The design provides outcomes that are more precise than those in most studies using conventional designs. Investigating performance effects on competition performance should provide outcomes that are more valid than effects on performance in tests.
The outcomes with the new design are still not precise enough for trivial and small effects. Researchers can be confident about such outcomes when the uncertainty in the effect approximates the smallest worthwhile effect (5). In swimming, the smallest substantial change in competition performance for top level swimmers is approximately 0.3% (5,10). We tried several strategies to enhance the obtained precision (±0.8%) to this level. Including the second-best event for each swimmer resulted in slightly better precision because the number of selected swims almost doubled without compromising reliability of performance. Including performances of all swimmers (not just elite) substantially increased the uncertainty, probably because of the decrease in reliability. We observed better precision (±0.5%) by assuming that the intervention had been applied to two different squads in two different competitions, but this scenario is unlikely and impractical. Clear outcomes with interventions producing trivial or smallest important effects for swimmers will therefore need inclusion of larger squads and more competitions where many of the squads are represented.
The variability in performance in top swimmers from competition to competition in this study (within-athlete variation of 1.1%–1.5%) was higher than that in earlier studies (0.8%) (10,15,16). These earlier estimates were based either on major competitions or on competitions close to a major competition. The difference in these estimates is likely related to differences in preparation for different competitions by the different squads. Our adjustment for the importance of the competition therefore did not apply equally to all the squads, resulting in higher within-athlete between-competition variation.
The new design allowed for quantifying and comparing mean swim performance differences between sexes, competitions, squads, and events. Such comparisons, most of which had adequate precision, could help coaches and performance consultants to assess various aspects of performance and to design strategies for improving it. For example, there appeared to be a relative weakness in women’s 200-m backstroke that US Swimming might have been interested in improving. The mean performance improvement of 0.8% from the world-championship trials to the world championships could also be useful information. These effects can be derived from analysis of competition performances whether or not an intervention has been applied.
One option for the modeling of longitudinal performance data is to include linear or higher-order polynomial trends (1,16). We did not model such trends here, because the data showed rapid improvement in performance in the last competitions of the season. Polynomial models would be appropriate in situations where trends are more gradual, and the trends might be clearer when differences between major and minor competitions are accounted for with appropriate use of dummy variables (1,16).
A limitation to the use of this competition-based design is that it requires many elite athletes competing against each other on a regular basis. Even within the big competition structure of the United States, we could estimate effects with relatively good precision only for squads with a large number of athletes competing in most of the selected competitions. Only five competitions really contributed to the estimation of effects; the original data selection included seven long-course competitions (Table 1), but the Charlotte UltraSwim and Southwest competitions did not contribute because the experimental squad or other squads were not represented. A related limitation is that the intervention is assumed to have similar effects on all the athletes in a squad. If, for example, an intervention was expected to affect only the sprinters in a squad, then only the sprinters would be included in the analysis, and the resulting smaller sample size might not provide adequate precision. The design may be suitable for team sports such as football, baseball, and basketball, because the squads compete weekly and therefore have more data for analysis. In team sports with low scores such as soccer, performances would need to be assessed through key performance indices derived from game analyses, provided these indices have sufficient reliability. More research is needed to develop analytical models to account for the pairwise competition structure of team sports and to determine whether they provide adequate precision for effects on performance.
The need for a stable baseline period of usual practice leading up to the intervention is another limitation of this new design: there can be no other intervention having a substantial effect on performance in the competitions selected for the baseline. Furthermore, the outcome represents the overall mean effect of what happened in a squad and will identify specific intervention effects only if the intervention represents a substantial change from usual practice. If multiple interventions or other factors have affected performance within a squad, the outcomes can be attributed only to the combination of interventions and factors but not to any one of them. Any major injuries or even psychological stress affecting a substantial proportion of the squad could thus invalidate the outcome. Changes arising from interventions or injuries are bound to occur in the teams representing the control group, but the effects of these changes should average out. Controlled trials in which subjects are randomized or assigned in a balanced fashion to intervention and control groups still provide the best approach to adjust for factors—other than the intervention itself—that could affect the outcome of a study. However, the price paid for such control in performance studies with athletes is concern about the validity of the performance test. If researchers can obtain a large enough sample of athletes, they should consider a randomized controlled trial and follow it up with validation of the treatment effect on competition performance using the design presented here.
In summary, researchers and coaches wanting to evaluate the effectiveness of new training or lifestyle interventions with their swim squad can use competition results as an alternative to traditional controlled trials. There is no need for placebo treatment or standardizing of training or other variables. The uncertainty of the resulting estimates is typically better than that for controlled trials. Deriving the outcome of the intervention using competitions eliminates doubt about the validity of effects investigated using performance tests. The design is likely to be suitable for any sport in which large squads of athletes compete against other squads in several competitions over a season.
No funding has been received for this work.
The authors have no professional relationship with a for-profit organization that would benefit from this study.
Publication does not constitute endorsement by the American College of Sports Medicine.
1. Bullock N, Hopkins WG. Methods for tracking athletes’ competitive performance in skeleton. J Sports Sci
. 2009; 27: 937–40.
2. Hopkins WG. A spreadsheet for combining outcomes from several subject groups. Sportscience
. 2006; 10: 51–3.
3. Hopkins WG. Estimating sample size for magnitude-based inferences. Sportscience
. 2006; 10: 63–70.
4. Hopkins WG. A spreadsheet for deriving a confidence interval, mechanistic inference and clinical inference from a P
. 2007; 11: 16–20.
5. Hopkins WG, Marshall SW, Batterham AM, Hanin J. Progressive statistics for studies in sports medicine and exercise science. Med Sci Sports Exerc
. 2009; 41 (1): 3–13.
6. Jean-St-Michel E, Manlhiot C, Li J, et al.. Remote preconditioning improves maximal performance in highly trained athletes. Med Sci Sports Exerc. 2011; 43 (7): 1280–6.
7. Kilding AE, Brown S, McConnell AK. Inspiratory muscle training improves 100 and 200 m swimming performance. Eur J Appl Physiol
. 2010; 108: 505–11.
8. Lindh A, Peyrebrune M, Ingham S, Bailey D, Folland J. Sodium bicarbonate improves swimming performance. Int J Sports Med
. 2008; 29: 519–23.
9. Parouty J, Al Haddad H, Quod M, Leprêtre PM, Ahmaidi S, Buchheit M. Effect of cold water immersion on 100-m sprint performance in well-trained swimmers. Eur J Appl Physiol
. 2010; 109: 483–90.
10. Pyne DB, Trewin CB, Hopkins WG. Progression and variability of competitive performance of Olympic swimmers. J Sports Sci
. 2004; 22: 613–20.
11. Robertson EY, Aughey RJ, Anson JM, Hopkins WG, Pyne DB. Effects of simulated and real altitude exposure in elite swimmers. J Strength Cond Res
. 2010; 24: 487–93.
12. Siegler JC, Gleadall-Siddall DO. Sodium bicarbonate ingestion and repeated swim sprint performance. J Strength Cond Res
. 2010; 24: 3105–11.
13. Toubekis AG, Adam GV, Douda HT, Antoniou PD, Douroundos II, Tokmakidis SP. Repeated sprint swimming performance after low-or high-intensity active and passive recoveries. J Strength Cond Res
. 2011; 25: 109–16.
14. Toubekis AG, Peyrebrune MC, Lakomy HKA, Nevill ME. Effects of active and passive recovery on performance during repeated-sprint swimming. J Sports Sci
. 2008; 26: 1497–505.
15. Trewin CB, Hopkins WG, Pyne DB. Relationship between world-ranking and Olympic performance of swimmers. J Sports Sci
. 2004; 22: 339–45.
16. Vandenbogaerde TJ, Hopkins WG. Monitoring acute effects on athletic performance with mixed linear modeling. Med Sci Sports Exerc
. 2010; 42 (7): 1339–44.