Secondary Logo

Journal Logo

Articles

Statistical Considerations for Exercise Protocols Aimed at Measuring Trainability

Voisin, Sarah1; Jacques, Macsue1; Lucia, Alejandro2,3; Bishop, David J.1,4; Eynon, Nir1,5

Author Information
Exercise and Sport Sciences Reviews: January 2019 - Volume 47 - Issue 1 - p 37-45
doi: 10.1249/JES.0000000000000176
  • Free

Key points

  • All exercise training studies have reported high interindividual variability in response to similar exercise training. Whether this large variability is mostly due to different responses between individuals or mostly random noise is a matter of debate.
  • We list the different sources of variability in exercise training studies that can contribute to this variability.
  • We reanalyzed a published study whose protocol mimicked a crossover study with repeated intervention. We found that the magnitude of the within-subject variability was greater than trainability, which highlights the need for scientists to consider within-subject variability in exercise training studies.
  • Several protocols have been proposed to study individual responses to exercise training, but only a repeated intervention or repeated testing during the course of the intervention can partition within-subject variability from trainability.

INTRODUCTION

Exercise training results in many morphological, metabolic, and functional adaptations in the human body. The magnitude of these adaptations depends on training parameters, such as the duration, intensity, volume, and type of exercise training (1), as well as nontraining factors such as sleep (2), diet (3), and genetics (3). However, it is becoming clear that there is large interindividual variability in the observed response to an apparently similar exercise training stimulus (4–9). This variability has been observed for physiological (9), health- (9), and performance-related (6) measurements, and in both short (<1–6 months) and longer (>6 months) (8) exercise training interventions.

At the core of personalized exercise prescription lays the assumption that individuals have a variable response to exercise training, with some individuals showing little to no improvement (i.e., low-responders), whereas others significantly improve after a specific training regime (i.e., high-responders) (10). The idea of responsiveness to an intervention does not only pertain to exercise physiology, as personalized medicine has recently gained momentum in the fields of pharmacology (11,12) and nutrition (13). One example from personalized nutrition is the development of a machine learning algorithm applied to personal and microbiome features from thousands of people, which led to the design of a short-term personalized dietary intervention that successfully lowered postprandial glycemia (13). Importantly, this nutritional intervention used a repeated design to ensure that the response to different foods was truly specific to a given individual (13). Thus, given that interindividual dietary responses were established in this pioneer study (13), and that high interindividual variability in training responses consistently has been observed, it may be assumed that there is a combination of inherent (e.g., genetic) and environmental components that predispose some individuals to be more responsive to exercise training than others (14). However, the presence of random variability in data collected during exercise training studies in humans hinders the identification of such components (7,8,15–18).

In the current review, we propose study design and statistical considerations for studies focused on individual responses to exercise training. We hypothesize that within-subject variability (i.e., the variable response of a given individual to the same exercise training) is an important source of variability that currently prevents the accurate quantification of individual training response. We argue that if we wish to use physiological and molecular data to prescribe personalized exercise programs, we should better design exercise studies and address all sources of variability. In particular, we should ideally implement repeated interventions on the same participants for short interventions, or repeated tests during the exercise training program for long interventions, to evaluate the magnitude of within-subject variability, ensuring the phenotype of interest is valid and useful. Designing exercise studies that quantify most sources of random variability can prevent the wasting of resources and effort on unworkable interventions.

DEFINITIONS OF TRAINABILITY, RESPONDERS/NONRESPONDERS, AND HIGH/LOW-RESPONDERS

There are no established standard definitions for the terms “responder/non-responder,” “extreme/high/modest/low-responder,” and “trainability” that can be found in the literature (7,18–22). However, for the purpose of this review, we have used the following definitions, largely based on Hecksteden et al. (7,18). The terms responders, non-responders, and adverse responders (18,19) to exercise training are relevant when a threshold, whether purely theoretical or empirically determined, is defined to split individuals into those categories (18). Using the concepts of high/extreme, modest, and low-responders implies that no formal threshold has been used to classify individuals into categories, but individuals are instead grouped depending on their relative degree of response and those groups are then compared with each other (18,20). Trainability, or individual training response, is the consistent response of a given individual to a specific intervention and is a continuous measure (7). In lay terms, it can be formulated as “how much better (or worse) an individual responded to a specific exercise training program, compared with the typical/average response that is observed in the group.” It has a straightforward definition in statistics, which is the subject-by-training interaction (random effect) in a linear mixed model (7). The distinction between those three concepts seems trivial but is essential to understand differences in methodology between studies and to clarify why the very existence of responders and nonresponders to exercise is debated.

  1. Responders/non-responders. For some, “responders” are simply those who show a positive response after training (post- minus pretraining (Δ) > 0), as opposed to nonresponders who show a negative response (Δ ≤ 0), regardless of its magnitude (4,23). Others have considered individuals to be responders if they show a response with a magnitude beyond the random error in individual measurement, but there is no uniform formula to calculate this threshold (19,22,24,25). This random error in individual measurement is distinct from within-subject variability and consists of natural biological day-to-day and technical variability (see the “Sources Of Variability” section for further details). Finally, some have used practically meaningful thresholds, such as the “smallest worthwhile difference” (16,26) or the “minimum clinically important difference” (27), above which individuals are considered to be responders (8,17). Studies that are more clinical or performance-oriented require target outcomes, such as a significant improvement in a patient’s survival rate or an athlete’s personal best time to judge the efficacy of an intervention for a particular individual. However, such dichotomization (i.e., responder/nonresponder) is not appropriate in studies aiming to uncover modulators of the variable response to exercise training. Indeed, the transformation of the continuous spectrum of exercise responses into a dichotomous variable leads to unnecessary loss of information and statistical power (28).
  2. High/low-responders. A few studies have not used any formal threshold and have split their data into subgroups to contrast the “high-responders,” “moderate-responders,” and “low-responders” (20,29,30). The quantile method splits the range of responses into tertiles or quartiles, which are then contrasted with each other (29,30). The clustering method (e.g., K-means clustering) allocates each individual to the closest group and contrasts those groups with each other (20). Although these approaches are parsimonious and do not rely on external data for their classification (18), the number of groups is determined arbitrarily even though there may be only subtle or no true differences between the groups, especially if most of the observed variability is random. In addition, these methods tend to produce groups of equal size, which can be problematic if the response is heterogeneous within groups, such as when the response distribution is skewed or outliers exist. Finally, the simplification of a continuous variable into categories leads to an unnecessary loss of power and information (28).
  3. Trainability. Trainability is the consistent response of a given individual to a specific exercise training, devoid of within-subject variability (7). However, the notion of a consistent response can be misleading, as trainability is influenced by inherent factors that have little to no variation across time (e.g., genetic factors, sex) and others that change with time (e.g., age, epigenetic patterns). For instance, a given individual may present high trainability to resistance exercise training when young and healthy but might have a lower trainability if affected by a chronic weakening condition at an advanced age. It is therefore essential to acknowledge that individual training response estimates are susceptible to change, and this has implications for personalized exercise prescription.

RANDOM VARIABILITY

The observed variability in collected data is always a mix of true variability between individuals and random variability due to experimental and environmental factors. The key is to separate the variability of interest from the unwanted variability, but obtaining a high signal-to-noise ratio can be difficult. We first describe hereinafter the different sources of variability that are typically present in exercise training studies, and then discuss some of the methods used to distinguish trainability from noise.

Sources of Variability

From previous work on the topic, by Senn (15), Hopkins (16), Atkinson (8), Williamson (17), and Hecksteden (7,18), we have identified six sources of variability that contribute to the observed variance in a typical exercise training dataset (Fig. 1). Note that we used these sources of variability from a statistical perspective, meaning that they contribute to the variability observed not only in individual training responses but also in the whole dataset.

Figure 1
Figure 1:
Sources of variability in exercise training studies. In this exercise training study, we have represented the hypothetical case where individuals underwent a control period, an exercise intervention, and a second intervention after an adequate washout period. We plotted the fitness levels of two individuals (black dots), as well as their mean (black bars). At the beginning of the control period, we also represented the scenario where their fitness levels were tested twice a few days apart (black crosses) and the black dots are the average of these repeated tests. We have represented each source of variability with blue arrows and a number. 1) Technical variability, due to machine and experimenter errors; 2) day-to-day biological variability, due to fluctuations in life components (sleep, diet, circadian rhythms) between individuals. 1) and 2) are illustrated by the difference between the black crosses; 3) variability due to different baseline values between subjects and measured with a random intercept in a linear mixed model; 4) variability due to the intervention, which is measured with a fixed effect in a linear mixed model (control vs intervention); 5) variability in response between individuals (trainability), which is measured with a random slope for each individual in a linear mixed model; 6) within-subject variability, which is the variability observed when a subject undergoes the same intervention again, and it can be estimated by comparing the slopes in the first and second interventions. The slopes can be made more accurate by doing multiple tests a few days apart for a given time point, or by doing repeated tests at regular time points during the course of the exercise intervention.
  1. Technical variability (6,18,19,22). This variability derives from differences in machine calibration, protocol, and experimenter. Its magnitude depends on the measured parameter (e.g., small magnitude for V˙O2max (31) and large magnitude for mitochondrial respiration (32)); it is theoretically identical for all individuals. It can be illustrated by the question: what would have happened if the outcome had been measured on a different machine, with a different protocol, or by a different person?
  2. Biological day-to-day variability (6,18,19,22). This variability derives from differences in environmental factors, such as sleep quality, diet, weather, circadian time, psychological stress, or menstrual cycles between individuals, influencing the outcome. It is individual-specific. For instance, shift workers may display particularly large variability in performance during a test, as their sleep patterns often are erratic (33). It can be illustrated by the question: what would have happened if the outcome had been measured on a different day, at a different time of the day, or after a different meal?
  3. Variability due to exercise training, regardless of the individual. This source of variability is due to the intervention, as opposed to no intervention at all (i.e., control condition). It corresponds to the mean effect of the exercise intervention on all the individuals but does not contribute to the variability in individual training responses. It can be illustrated by the question: what would have happened to the phenotype if it had been measured after a similar period as the exercise-training program but without any intervention (i.e., control period)?
  4. Variability due to the individual, regardless of exercise training. This source of variability is due to individuals having different mean levels. In a heterogeneous cohort (e.g., large age or fitness range), it can be a major source of variability and needs to be taken into consideration, by adding a random intercept to a statistical linear mixed model, for example. It is a source of variability that is independent from the exercise training (i.e., it is not an interaction between individuals and exercise training and does not correspond to trainability). It can be illustrated by the question: what would have happened if the outcome had been measured on individual A instead of individual B?
  5. Variability in responses to the same exercise training between individuals. This variability corresponds to the interaction between each individual and the training and should not be mistaken for the abovementioned variability. Even if the exercise-training program had an average effect on all individuals, each individual showed a consistently better (or worse) response than the average response. This is the variability of interest (trainability, individual training response (7,18)) and its magnitude is debated (17,18), as it often is impossible to disentangle from within-subject variability. It can be illustrated by the question: what would have happened to the changes in outcome following the exercise-training program, if it was individual A instead of individual B?
  6. Within-subject variability. This source of variability is difficult to capture and is the focus of our review; however, Hecksteden et al. have discussed possible approaches to account for it (7,18). We currently do not know the magnitude of this variability, as it requires implementing repeated interventions or repeated tests during the intervention. It can be illustrated by the question: what would have happened to the changes in outcome in individual A if we applied the same training again?

The ideal exercise training protocol allows separating all these sources of variability, but it can be very resource- and time-consuming (7,15,16). Although we can estimate easily the magnitude of technical and biological day-to-day variability by performing a reliability trial (see section below), we currently have little information on the relative magnitudes of trainability and within-subject variability, which means that all observed between-subject differences in response to exercise training could be just noise. We aimed to address this gap by reviewing the only two studies we are aware of that provide insights into the magnitude of within-subject variability (18,34).

Within-Subject Variability Is an Important Yet Overlooked Source of Error

Two recent studies have elegantly provided some insights into the magnitude of within-subject variability (18,34). In the first study, 12 young (28.5 ± 3.8 years old), moderately fit (V˙O2max = 40.3 ± 4.3 mL·min−1·kg−1) men and women underwent two training periods separated by a washout period of 9 months. During the first training period, participants performed three sessions of knee extensions per week for 12 weeks with one leg only, whereas in the second training period (after a 9-month washout period), they trained both legs (34). This design is close to a crossover study with repeated intervention where participants are assigned successively to the control condition (the untrained leg in the first training period), and then in the intervention condition (the trained leg in the first training period) with a repetition of the intervention (the trained leg in the second training period). This approach makes it possible to quantify most sources of variability, including between-subject variability, subject-by-training interaction (trainability), and within-subject variability. This study has an advantage compared with a crossover study, because here, any lifestyle-related event that could affect muscle responses during either the “control” or the “intervention” periods would have had the same effect on both legs, thus significantly reducing random variation (6). Although the focus of this study was on skeletal muscle memory at the transcriptional level (34), it was interesting to observe individual responses to the two repeated training periods. Using the Digitizelt software (Köln, Germany), we extracted individual values from the Supplementary Data of the study (34) and quantified within-subject variability (Fig. 2, Table 1). As noted by the authors (34), changes in performance at a 15-min optimal performance test, in 3-hydroxyacyl-CoA dehydrogenase (β-HAD) activity and in citrate synthase (CS) activity, were surprisingly poorly correlated between the two training periods, which was reflected in our linear mixed model (Table 1). The magnitude of residual error (containing within-subject variability) was large compared with the magnitude of trainability (see 90% confidence interval in Table 1), and we did not detect significant trainability for any outcome (all P values >0.05 in likelihood ratio test).

Figure 2
Figure 2:
Correlation between changes in performance, citrate synthase (CS), and 3-hydroxyacyl-CoA dehydrogenase (β-HAD) in the same leg of the same individuals after the first and second training periods in Lindholm et al. (n = 12) (34).
TABLE 1
TABLE 1:
Estimates of trainability and within-subject variability in Lindholm et al., 2016 (33)

However, this analysis has limitations because of the study design. Contrary to a 2 × 2 classic crossover design, the “control” condition was only administered once and in a specific order, making it impossible to formally test for potential carry-over effects. Although there was no evidence for a training-induced skeletal muscle global transcriptome memory in the original study (34), there could be a residual effect at the epigenetic level that was not investigated. Furthermore, there could also be cross-education from one leg to the other (35). Finally, we could not separate random variability in individual measurements (i.e., technical and biological day-to-day variability) from trainability, as the reliability of performance measures in this study is unknown. It should however be noted that β-HAD and CS activities have technical variability <2 units; samples were all run in duplicates, and rerun if necessary until a reliable measure was obtained, thereby reducing the possible influence of technical variability (36), but not excluding the possibility of biological variability.

In the second study, 20 men and women underwent a 1-yr exercise-training program that consisted of walking or jogging 3 d·wk−1 for 45 min with a constant heart rate prescription. Participants were tested for V˙O2max at baseline, 3, 6, 9, and 12 months, which allowed building a progress curve to analyze each individual slope (trainability). Trainability was estimated accurately for each individual, but the SDs of the segmental changes (within-subject variabilities) were large compared with the overall progress they made (18).

The insights provided by the aforementioned studies suggest that within-subject variability is an important source of variability. We argue that most exercise studies do not yield trainability estimates that are accurate enough to warrant further investigation. Some protocols have been proposed to estimate individual responses to exercise training, but they do not all address the key issue of within-subject variability.

METHODS TO ESTIMATE TRAINABILITY

The following list of methods have been proposed recently and discussed by others (7,15,16,18). In this section, we discuss these methods and highlight their strengths and weaknesses. In addition, we have added a method that has not been discussed thoroughly in the past, which involves a control period before the intervention in the same participants.

Separate Control Group

The presence of a separate, independent control group provides an estimate of the variability because of the exercise intervention (15). Involving a control group allows estimating the magnitude of interindividual variability in the absence of exercise training and is essential to know whether the interindividual variability in the presence of exercise training is indeed due to the training (8,16,17). The variability in change scores in the control group is subtracted from the variability in change scores in the exercise group to obtain the true variability in change scores between individuals, according to the following formula:

where SDtrue is the true interindividual variability in response to the intervention, SDexerc is the observed interindividual variability in change scores in the exercise group, and SDcontrol is the observed interindividual variability in change scores in the control group (Fig. 3). This equation is based on the assumption that 1) both SDexerc and SDcontrol contain between-subject variability (i.e., the variation between participants given the same condition) and within-subject variability (i.e., the variation from occasion to occasion when the same individual is given the same condition) and 2) the only extra source of variability contained in SDexerc is the true interindividual variability in response to the intervention.

Figure 3
Figure 3:
Protocols to quantify interindividual variability in response to exercise training. Using maximum oxygen uptake (V̇O2max) as an example phenotype, we have represented the different methods to estimate trainability, namely the control group, control period, reliability trial, repeated testing, and repeated intervention methods. We also have written down the statistical calculations associated with each method to obtain trainability estimates at the group level or the individual level.

It should, however, be noted that SDtrue estimated with this approach may overestimate trainability because of residual within-subject variability in the exercise training group. Indeed, the response to training may vary from individual to individual (trainability), but the response to training also may vary from occasion to occasion for a given individual (within-subject variability), and this would lead to an inflation of variance within the exercise group (15). The control group method is nonetheless useful in medium (3–6 months) to long (>6 months) interventions, where it is possible to run both the exercise and control groups at the same time, but it significantly increases the required sample size. Moreover, the use of a control group in long exercise training studies can pose ethical issues when individuals are required to remain inactive for a long period of time (7).

Control Period Before the Intervention

One study design that has not been discussed thoroughly is the possibility to ask the participants to undergo a control period before starting the exercise program. This is slightly different from a crossover trial because the “treatments” (control/exercise) are administered in a particular order (first control, then exercise). Indeed, the appropriate washout period for exercise training studies is difficult to estimate, so this would be the only way to avoid the potential carry-over effects of exercise training. It is then possible to fit a linear mixed model to the data, as follows:

Δ is the change score in the measured outcome, Condition is a dichotomous variable corresponding to control/exercise, Covariates is any relevant covariate that can influence Δ (such as age), and random(ID) is a random effect that allows each individual to have his or her own intercept. The residual error of this model would then be a gross estimate of SDtrue. This approach is slightly superior to the use of a control group as each individual is assessed in both the control and the exercise conditions, which decreases the variability because of random sampling. This approach also reduces the required number of both assessment tests and participants because the end of the control period can serve as a baseline for the beginning of the exercise period (Fig. 3). A major downside of this approach is the fact that each participant is required to stay in the study for twice as long, which is arguably only suitable for short-duration (e.g., few months only) exercise interventions. Finally, yet importantly, this estimation of SDtrue cannot disentangle trainability from within-subject variability.

Reliability Trial

When neither a control group nor a control period is available, some resort to reliability trials that consist in repeating the same exercise tests a few times and a few days apart (e.g., exercise test to exhaustion), repeating the same test multiple times in a row on the same machine (e.g., mitochondrial respiration), or running biological samples in technical duplicates or triplicates (e.g., gene expression) (Fig. 3). All these tests provide estimates for technical variability, and the exercise tests also include biological day-to-day variability. Although repeated tests are not needed at each time point, averaging duplicates or triplicates increases the accuracy of individual measurements. Second, even if this method cannot directly estimate SDtrue, between-test variability can be used to calculate a threshold above which individuals may be classified as “responders.” It should be noted that the accuracy of classification in the responder and nonresponder categories highly depends on the reliability of the test (i.e., a noisy test cannot detect true changes with certainty when they are small). All the calculated cutoffs are based on the so-called typical error of measurement (TEM), calculated with the following formula (39):

where n is the sample size and x is the measurement of interest. TEM could also be called “within-subject standard deviation,” as it corresponds to the square root of the sum of the squared differences of replicates divided by twice the number of pairs of replicates. Importantly, TEM includes the variability due to machine calibration and human error, so it is likely to be specific to a given laboratory; for physiological and performance tests, it also includes day-to-day biological variability, so it is likely to be specific to the studied population (e.g., young/old, trained/untrained). Therefore, we suggest that studies including a reliability trial should assess TEM instead of extracting it from the literature.

Repeated Intervention With a Control Period

Theoretically, the best method to separate trainability from within-subject variability is to repeat the exercise intervention on the same participants, after an adequate washout period (Fig. 3). This is achieved by making the participants undergo 1) a control period, 2) an exercise period, 3) a washout period, 4) another exercise period, and fitting the following linear mixed model to the data:

Δis the change score in the measure outcome, Condition is a dichotomous variable corresponding to control/exercise, Covariates is any relevant covariate that can influence Δchange (such as age), random(ID) is a random effect that allows each individual to have his or her own intercept, and random(ID*Condition) is a random effect that allows each individual to have his or her own slope (trainability). Each individual slope corresponds to the trainability estimate for each individual, separated from within-subject variability that is contained in the residual variability of the model. The residual variability also will contain technical and biological day-to-day variability, and only a reliability trial can estimate it.

Although this is the most compelling way to obtain individualized trainability estimates, it has many practical limitations. First, it is extremely time- and resource-consuming, because participants are required to remain in the study for two training periods separated by a washout for whose duration there is no guidelines. Second, the high risk of participant dropout would not allow to achieve a large sample size and the sample could end up being biased if low-responders were more likely to quit. Third, a single repetition of the exercise training may not be sufficient to obtain good estimates of trainability if within-subject variability is large, and there may be long-lasting effects of the first intervention (e.g., at the epigenetic level) that are difficult to account for. A repetition of the exercise training program is therefore recommended for short exercise interventions (<2 months), where participant attrition is kept to a reasonable amount and a short washout period is likely to be sufficient.

Repeated Tests During the Exercise Training Program

An elegant, recently proposed way to circumvent the need for a repeated intervention is to perform additional tests on subjects during the exercise-training period, provided that the training period is long enough to allow for repeated assessments (Fig. 3). This permits building a slope of the progress for each individual and examining segmental changes to partition trainability from within-subject variability. The distance between individual points and the slope represents this within-subject variability, and the further the points are from the slope, the greater within-subject variability is (and the less accurate the slope is). A linear mixed model is an appropriate way to analyze these data, as follows:

This random intercept and slope model estimates the time course of outcome changes for each individual. Of note, nonlinear mixed models also are available when the change in phenotype with time is nonlinear (such as when a plateau is reached) (7), and the autocorrelation between measurements can be accounted for (40). A brilliant twist of this protocol is that without repeating the intervention, trainability (the magnitude of individual slopes) and within-subject variability (variability between different segments of individual slopes) can be partitioned. This method is more time- and resource-efficient than any of the abovementioned methods, but it seems only appropriate for medium to long interventions (e.g., >2 months). Repeated tests are hardly feasible during short interventions where they are likely to interfere with the training itself.

CONCLUSIONS AND PERSPECTIVES

Random noise is prevalent in exercise training studies, but elegant protocols have been proposed recently to isolate individual responses to exercise training. The existence of the terms “responders” and “non-responders” to exercise is increasingly and rightfully being challenged, as individuals originally identified as nonresponders following a specific training protocol show actual improvements if the type of training is changed (24), if the frequency of training is increased (41), or if the training intensity is increased (42,43). The terms “responders” or “non-responders” are not fundamentally wrong, but because most studies aim to uncover the genetic, epigenetic, and molecular modulators of trainability, such dichotomous classification is not actually precise and reduces statistical power.

To our knowledge, no study has performed yet a qualitative comparison of the different methods to quantify trainability. However, it is possible to give broad guidelines to researchers interested in personalized exercise response based on the research question, the sample size, budget, and time allocation (Table 2). A crossover study with repeated intervention is the criterion standard to partition all sources of variability. However, it is a challenging protocol (16,17) owing to the high risk of participant dropout and the lack of guidelines for appropriate washout periods (training may have unknown and long-lasting effects at the cellular and molecular levels). Long exercise studies may instead conduct repeated testing during the course of the intervention or cleverly resort to alternative protocols where participants only train one leg in the first intervention and train both legs in the second intervention (34). Linear mixed modeling is particularly adapted to the analysis of data generated by those protocols.

TABLE 2
TABLE 2:
Suggested protocols to study trainability

Hecksteden et al. have shown great discrepancy between different approaches to classify individuals into responders and nonresponders to exercise training (18), and we hope that future studies will perform simulations with known trainability and variability parameters to help clarify which protocols are best adapted to estimate trainability. We also hope that future studies will implement some of the steps we have suggested, namely a repeated intervention or repeated tests during the exercise-training program. In the Gene Skeletal Muscle Adaptive Response to Training study, we are currently implementing a control period, a repeated intervention, and repeated testing during the intervention to have a comprehensive view of the methods that have been proposed to measure trainability (44). Uncovering the modulators of trainability would generate relevant and progressive knowledge (45,46), but it is of paramount importance to ensure first that our protocols are accurate enough to measure trainability devoid of within-subject variability.

Acknowledgment

The authors thank Michael J. Joyner for his input.

References

1. Hawley JA, Hargreaves M, Joyner MJ, Zierath JR. Integrative biology of exercise. Cell. 2014; 159(4):738–49.
2. Watson AM. Sleep and athletic performance. Curr. Sports Med. Rep. 2017; 16(6):413–8.
3. Heck AL, Barroso CS, Callie ME, Bray MS. Gene-nutrition interaction in human performance and exercise response. Nutrition. 2004; 20(7–8):598–602.
4. Bouchard C, Rankinen T. Individual differences in response to regular physical activity. Med. Sci. Sports Exerc. 2001; 33(6):S446–51.
5. Timmons JA, Larsson O, Jansson E, et al. Human muscle gene expression responses to endurance training provide a novel perspective on Duchenne muscular dystrophy. FASEB J. 2005; 19(7):750–60.
6. Mann TN, Lamberts RP, Lambert MI. High responders and low responders: factors associated with individual variation in response to standardized training. Sports Med. 2014; 44(8):1113–24.
7. Hecksteden A, Kraushaar J, Scharhag-Rosenberger F, Theisen D, Senn S, Meyer T. Individual response to exercise training—a statistical perspective. J. Appl. Physiol. 2015; 118(12):1450–9.
8. Atkinson G, Batterham AM. True and false interindividual differences in the physiological response to an intervention. Exp. Physiol. 2015; 100(6):577–88.
9. Churchward-Venne TA, Tieland M, Verdijk LB, et al. There are no nonresponders to resistance-type exercise training in older men and women. J. Am. Med. Dir. Assoc. 2015; 16(5):400–11.
10. Bouchard C, Daw EW, Rice T, et al. Familial resemblance for V˙O2max in the sedentary state: the HERITAGE family study. Med. Sci. Sport Exerc. 1998; 30(2):252–8.
11. Vitezić D, Božina N, Mršić-Pelčić J, Turk VE, Francetić I. Personalized medicine in clinical pharmacology. In: Bodiroga-Vukobrat N, Rukavina D, Pavelić K, Sander GG, editors. Personalized Medicine: A New Medical and Social Challenge. Cham: Springer International Publishing; 2016. p. 265–78.
12. Turner RM, Park BK, Pirmohamed M. Parsing interindividual drug variability: an emerging role for systems pharmacology. Wiley Interdiscip. Rev. Syst. Biol. Med. 2015; 7(4):221–41.
13. Zeevi D, Korem T, Zmora N, et al. Personalized nutrition by prediction of glycemic responses. Cell. 2016; 163(5):1079–94.
14. Williams CJ, Williams MG, Eynon N, et al. Genes to predict V˙O2max trainability: a systematic review. BMC Genomics. 2017; 18(Suppl. 8):831.
15. Senn S. Mastering variation: variance components and personalised medicine. Stat. Med. 2016; 35(7):966–77.
16. Hopkins WG. Individual responses made easy. J. Appl. Physiol. 2015; 118(12):1444–6.
17. Williamson PJ, Atkinson G, Batterham AM. Inter-individual responses of maximal oxygen uptake to exercise training: a critical review. Sports Med. 2017; 47(8):1501–13.
18. Hecksteden A, Pitsch W, Rosenberger F, Meyer T. Repeated testing for the assessment of individual response to exercise training. J. Appl. Physiol. 2018; 124(6):1567–79.
19. Bouchard C, Blair SN, Church TS, et al. Adverse metabolic response to regular exercise: is it a rare or common occurrence? PLoS One. 2012; 7(5):e37887.
20. Thalacker-Mercer A, Stec M, Cui X, Cross J, Windham S, Bamman M. Cluster analysis reveals differential transcript profiles associated with resistance training-induced human skeletal muscle hypertrophy. Physiol. Genomics. 2013; 45(12):499–507.
21. Joyner MJ, Lundby C. Concepts about V˙O2max and trainability are context dependent. Exerc. Sport Sci. Rev. 2018; 46(3):138–43.
22. Weatherwax RM, Harris NK, Kilding AE, Dalleck LC. The incidence of training responsiveness to cardiorespiratory fitness and cardiometabolic measurements following individualized and standardized exercise prescription: study protocol for a randomized controlled trial. Trials. 2016; 17(1):601.
23. Pérusse L, Gagnon J, Province MA, et al. Familial aggregation of submaximal aerobic performance in the HERITAGE Family study. Med. Sci. Sport Exerc. 2001; 33(4):597–604.
24. Bonafiglia JT, Rotundo MP, Whittall JP, Scribbans TD, Graham RB, Gurd BJ. Inter-individual variability in the adaptive responses to endurance and sprint interval training: a randomized crossover study. PLoS One. 2016; 11(12):e0167790.
25. Scharhag-Rosenberger F, Walitzek S, Kindermann W, Meyer T. Differences in adaptations to 1 year of aerobic endurance training: individual patterns of nonresponse. Scand. J. Med. Sci. Sports. 2012; 22(1):113–8.
26. Hopkins WG, Hawley JA, Burke LM. Design and analysis of research on sport performance enhancement. Med. Sci. Sports Exerc. 1999; 31(3):472–85.
27. Guyatt GH, Juniper EF, Walter SD, Griffith LE, Goldstein RS. Interpreting treatment effects in randomised trials. BMJ. 1998; 316(7132):690–3.
28. Aiken LS, West SG, Reno RR. Multiple Regression: Testing and Interpreting Interactions. Newbury Park (CA): Sage Publications; 1991. p. 212.
29. Timmons JA, Jansson E, Fischer H, et al. Modulation of extracellular matrix genes reflects the magnitude of physiological adaptation to aerobic exercise training in humans. BMC Biol. 2005; 3:19.
30. Vollaard N, Constantin-Teodosiu D, Fredriksson K, et al. Systematic analysis of adaptations in aerobic capacity and submaximal energy metabolism provides a unique insight into determinants of human aerobic performance. J. Appl. Physiol. 2009; 106(5):1479–86.
31. Katch VL, Sady SS, Freedson P. Biological variability in maximum aerobic power. Med. Sci. Sports Exerc. 1982; 14(1):21–5.
32. Cardinale DA, Gejl KD, Ortenblad N, Ekblom B, Blomstrand E, Larsen FJ. Reliability of maximal mitochondrial oxidative phosphorylation in permeabilized fibers from the vastus lateralis employing high-resolution respirometry. Physiol. Rep. 2018; 6(4).
33. Åkerstedt T, Wright KP. Sleep loss and fatigue in shift work and shift work disorder. Sleep Med. Clin. 2009; 4(2):257–71.
34. Lindholm ME, Giacomello S, Werne Solnestam B, et al. The impact of endurance training on human skeletal muscle memory, global isoform expression and novel transcripts. PLoS Genet. 2016; 12(9):e1006294.
35. Hendy AM, Lamon S. The cross-education phenomenon: brain and beyond. Front. Physiol. 2017; 8:297.
36. Lindholm ME, Marabita F, Gomez-Cabrero D, et al. An integrative analysis reveals coordinated reprogramming of the epigenome and the transcriptome in human skeletal muscle after training. Epigenetics. 2014; 9(12):1557–69.
37. R: A language and environment for statistical computing [Internet]. Vienna, Austria: R Foundation for Statistical Computing; [cited 2018 Oct 10]. Available from: http://www.R-project.org/.
    38. Kuznetsova A, Brockhoff PB, Christensen RHB. ImerTest package: tests in linear mixed effects models. J. Stat Software [Internet]. 2017; 1(13). Available from: https://www.jstatsoft.org/v082/i13. doi: 10.18637/jss.v082.i13.
      39. Selected body measurements of children 6–11 years. Vital Health Stat. 11. 1973; 123:1–48.
      40. Pinheiro J, Bates D, DebRoy S, Sarkar D; Team RC. nlme: Linear and Nonlinear Mixed Effects Models [Internet]. 2018; 1–137. R package version 3. Available from: https://cran.r-project.org/package=nlme.
      41. Montero D, Lundby C. Refuting the myth of non-response to exercise training: “non-responders” do respond to higher dose of training. J. Physiol. 2017; 595(11):3377–87.
      42. Ross R, de Lannoy L, Stotz PJ. Separate effects of intensity and amount of exercise on interindividual cardiorespiratory fitness response. Mayo Clin. Proc. 2015; 90(11):1506–14.
      43. Sisson SB, Katzmarzyk PT, Earnest CP, Bouchard C, Blair SN, Church TS. Volume of exercise and fitness nonresponse in sedentary, postmenopausal women. Med. Sci. Sports Exerc. 2009; 41(3):539–45.
      44. Yan X, Eynon N, Papadimitriou ID, et al. The Gene SMART study: method, study design, and preliminary findings. BMC Genomics. 2017; 18(Suppl. 8):821.
      45. Wang G, Tanaka M, Eynon N, et al. The future of genomic research in athletic performance and adaptation to training. Med. Sport Sci. 2016; 61:55–67.
      46. Voisin S, Eynon N, Yan X, Bishop DJ. Exercise training and DNA methylation in humans. Acta Physiol. 2015; 213(1):39–59.
      Keywords:

      trainability; within-subject variability; repeated intervention; responder; variable response to exercise training; individual training response

      Copyright © 2018 by the American College of Sports Medicine