The Validity and Reliability of Intestinal Temperature during Intermittent Running : Medicine & Science in Sports & Exercise

Journal Logo

BASIC SCIENCES: Original Investigations

The Validity and Reliability of Intestinal Temperature during Intermittent Running


Author Information
Medicine & Science in Sports & Exercise 38(11):p 1926-1931, November 2006. | DOI: 10.1249/01.mss.0000233800.69776.ef
  • Free


There is currently a dearth of knowledge concerning the responses of core temperature (Tcore) during prolonged, intermittent team sports such as soccer, rugby, and field hockey. One reason for this lack of knowledge is the impracticality associated with measuring Tcore with cables and cumbersome data-logging equipment during unconstrained exercise. Recently, the use of disposal ingestible temperature sensors has become a popular alternative for research and professional sport. This measurement technique overcomes many of the performance, comfort, and sanitary problems associated with invasive methods of determining Tcore. However, little is known about the validity and reliability of this measure, particularly during dynamic, weight-bearing exercise.

A number of previous researchers have attempted to compare Tint with other techniques at rest and during static forms of exercise. It has been shown that Tint and esophageal temperature (Tes) record higher temperatures than in the rectum (Trec) during cycling (13,14,21). A similar bias between Tint and Trec has been reported by Edwards and coworkers (9) while monitoring the routine daily activities of participants. Nevertheless, O'Brien and colleagues (17) demonstrated that Tint was a closer representation of Trec than Tes in cycling exercise. Ingestible sensors have also been shown to record both lower temperatures than Trec (20) and higher values than Tes (12) during treadmill exercise. Little research work has addressed the speed of response or time to a threshold of Tint in comparison with other measurement sites. It appears, from the sparse and conflicting findings of previous studies, that trends in bias between methods of measurement may be related to both the position of the body during exercise (seated vs upright) and the mode of the exercise (interval vs continuous). This notion seems plausible, considering the differing amount of muscle mass involved in each form of exercise and the unique circulatory responses to these different activities and postures.

An additional factor that must be considered is intestinal transit time, because this will influence the time interval between ingestion of the capsule and the start of data collection. Previous researchers have used a range of time periods between pill ingestion and data collection. If the sensor resides within the stomach, or an area of the small intestine in close proximity to the stomach, Tint may be influenced by the temperature of ingested material. Intestinal transit time at rest is controlled by numerous neural and hormonal factors. A particularly important modifiable factor is daily energy intake, which appears not to have been accounted for previously. High-intensity exercise alters many variables that control intestinal motility and involves rhythmical mechanical forces that are likely to have a pronounced effect on the position and mobility of ingested sensors.

No researchers to date have examined the in vivo repeatability of Tint measurements made on separate occasions using either a within-subject or between-subject design. Depending on the activity patterns and dietary habits of individuals, intestinal motility may be considerably different between repeated experimental trials. This difference could result in changes in Tint in excess of the biological variability that occurs between trials with other techniques.

Clearly, more investigation is required into the suitability of measuring Tint during exercise, particularly with respect to unconstrained running. Therefore, the purpose of the present study was to examine Tint during high-intensity, intermittent shuttle running exercise by comparing this technique against Trec, and, secondly, by examining the reliability of the technique when this standardized exercise test was completed on separate occasions.


This study comprised two investigations based around a generic exercise protocol, the Loughborough intermittent shuttle running test (LIST) (16). Investigation A involved one main experimental trial in which participants completed four blocks of the LIST activity and rest pattern (60 min of exercise) while Trec was compared with Tint. Investigation B consisted of a repeated-measures design in which Tint was recorded during six blocks of the LIST protocol (90 min of exercise) completed on two occasions, which were separated by 7 d.


Participants were fully informed of the demands and possible risks associated with the investigations and were allowed to withdraw from the study at any time. Each volunteer provided written consent before taking part, and both investigations were approved by the Loughborough University ethical advisory committee. Ten competitive male games players aged 24 (21-26) yr with a body mass of 76 ± 12 kg and height of 1.8 ± 0.1 m completed investigation A. The mean predicted V˙O2max of the sample was 57 ± 4 mL·kg−1·min−1. Participants were familiar with the technique for measuring Trec during exercise and habitually spent 280 ± 111 min·wk−1 engaged in training activities that involved multiple-sprint exercise. Investigation B involved a sample of nine semiprofessional male soccer players aged 21 (19-23) yr with a mean body mass of 74 ± 4 kg and average height of 1.8 ± 0.1 m. The mean predicted V˙O2max of participants was 58 ± 4 mL·kg−1·min−1, and 211 ± 96 min·wk−1 was spent engaged in training activities applicable to high-intensity intermittent exercise.

The above sample sizes are larger than those selected in other relevant studies (9,12-14,17,20,21) but are below those recommended by Altman (1) for optimal statistical precision in method comparison and repeatability studies in a clinical context (≥ 40 subjects). Such a sample size would have been unrealistic for each of our two studies involving a prolonged and difficult exercise protocol as well as invasive biological measurements. In such situations, several commentators have stressed the importance of assessing whether statistical precision impacts strongly on the sample estimates of error (2,4,6). In our study, we have calculated 95% confidence intervals for this assessment (see Statistical Analysis section).

Preliminary measurements.

Before the main experimental trials in both investigations, participants completed the Multistage Fitness Test (18) to estimate V˙O2max. The resultant predicted V˙O2max scores were used to calculate running velocities corresponding to 55 and 95% of V˙O2max for the appropriate phases of the shuttle running protocol.

On a second visit to the laboratory, a familiarization session consisting of three blocks of the LIST protocol was completed under experimental conditions. This session resulted in participants completing 45 min of exercise and was conducted to accustom participants to the demands and activity patterns of the protocol. The instrumentation and data-collection procedures employed in the main trials were also used during this familiarization session. The above preliminary tests were performed 1-2 wk before the main experimental trials.

Main trials.

The main experimental trials in both investigations began at 08:00 h. Disposable temperature-sensor capsules (Cor-100, HQinc, Palmetto, FL) were ingested 10 h before trials before an overnight fast. Participants ingested a prescribed volume of water and arrived at the laboratory with a sample of their first morning urine. On arrival, serial Tint measurements were taken to establish that the temperature sensors were working correctly and to confirm that Tcore was in a steady state. In investigation A, participants then moved to a private room and inserted a flexible rectal thermistor (401, YSI, Yellow Springs, OH) to a depth of 10 cm beyond the anal sphincter, as indicated by a bulb attached to the cable. Intestinal temperature sensors and rectal thermistors were calibrated before use with a mercury thermometer (15-45°C, LW Scientific, Tucker, GA) in a stirred water bath (Thermed 5001, GFL, Hannover, Germany).

Rectal temperature was recorded by connecting rectal thermistors to a lightweight (25 g) portable data logger (ML2002, Mini-Mitter Inc, Sunriver, OR). The temperature of intestinal sensors was measured using a portable ambulatory data recorder (CorTemp 2000, HQinc, Palmetto, FL) (218 g). In investigation A, both data loggers were packed into a low-profile neoprene waist pouch, which was tightly secured to the posterior lumbar region of the torso. The rectal thermistor cable was secured to the skin superior to the natal cleft with medical tape (Transpore, 3M, Loughborough, UK) to prevent any movement. At the start of exercise in investigation A, the Tint and Trec data loggers were activated simultaneously, and data were collected with a sampling interval of 5 s and resolutions of 0.01 and 0.05°C, respectively. In investigation B, intestinal temperature was measured before, during, and after each walking phase of the LIST protocol (approximately every 80 s) by briefly positioning the ambulatory data recorder within 30 cm of the posterior lumbar region of the participant's back. Heart rate was measured at 5-s intervals throughout each protocol via short-range telemetry (Polar Electro, Kempele, Finland).

Before the start of exercise, participants entered a gymnasium adjacent to the laboratory maintained at an environmental temperature of approximately 15°C and 60% relative humidity. Participants were given water at 4°C in a volume equivalent to 6.5 mL·kg−1 of body mass (BM). This water was consumed within 180 s. After ingestion of the water, the exercise protocol commenced, consisting of four blocks of LIST activity and rest pattern in investigation A and six blocks in investigation B. During each maximal-sprint phase of the protocol, the time taken to complete the 15-m distance was measured using infrared photoelectric cells (RS Components, Switzerland) interfaced with a computer. At the start of the 3-min rest period that followed each block of exercise, a volume of water equivalent to 3.5 mL·kg−1 BM at 4°C was ingested within 150 s.

Statistical analysis.

A range of measurement-error statistics were calculated (3). Paired Student's t-tests (two tailed) and the associated 95% confidence intervals (95% CI) were employed to examine systematic bias between the methods of temperature measurement. For description of relative agreement, Pearson's correlation coefficients (r), adjusted to allow for repeated observations on each individual (5), and intraclass correlation coefficients (ICC) were calculated (3). For description of absolute agreement, the within-subjects standard deviation (also known as the standard error of measurement (SEM) in the case of a reliability examination), the coefficient of variation (CV), and 95% limits of agreement (LOA) were calculated (3,7). Ninety-five percent confidence limits were also calculated for the above-random error statistics.

The choice of appropriate agreement statistic depends on the distribution of errors and whether there is a relation between error and the magnitude of measured value (3). When residuals differ substantially from a Gaussian distribution and/or the random errors are found to be proportional to the size of the measured value, the raw data should be logarithmically transformed and described with ratio-type statistics (e.g., coefficient of variation) rather than error statistics expressed in the actual units of measurement (1). The presence of such proportional error was investigated through inspection of Bland-Altman plots and correlation exploration as outlined by Nevill and Atkinson (15).

Error statistics and the associated 95% CI were compared with values that were deemed to be practically important. In investigation A, we delimited that a systematic bias of more than 0.1°C between methods would be practically significant in affecting decisions made on an individual's thermal status. This bias limit is also stipulated by the British standard for clinical thermometers (8). Given that the limits of agreement describe the largest difference between methods that can be expected for any individual and that biological variation is inherent in our measurements, we deemed that this statistic should not be larger than ± 0.3°C. In investigation B, we deemed that the test-retest measurement error of Tint would be acceptable on the basis of an excellent correlation (> 0.9), a small CV (< 2%), and a within-subjects standard deviation (SD) < 0.2°C. Using calculations based on a statistical power of 80%, it was estimated that this magnitude of random error would enable the detection in a future study of a 0.2°C change in body temperature using a feasible sample size of 16 participants (2).


Investigation A.

A mean bias of −0.15°C (95% CI, −0.10 to −0.20) was found between the methods of temperature measurement throughout the exercise protocol (P < 0.001; Table 1). This bias for Tint to consistently provide higher temperatures than Trec is shown in the Bland-Altman plot (Fig. 2). It can be seen in Figure 2 that the bias between methods is uniform (parallel) over the range of measurements. The bias between methods is also apparent in the means plot of temperature versus time (Fig. 1). The mean bias was found to decrease slightly as the exercise protocol progressed (−0.17°C block 1 vs −0.12°C block 4; Table 1).

Mean rectal and intestinal temperature (mean ± SD; °C) during four blocks of the Loughborough intermittent shuttle running test (LIST) protocol.
Statistical comparisons between rectal and intestinal temperature (°C) during each 15-min block of the Loughborough intermittent shuttle running test (LIST) protocol.

A mean intraclass correlation coefficient of 0.99 (95% CI, 0.93-1.05) suggested an excellent degree of relative agreement between the Tint and Trec techniques (22). The size of either the ICC or Pearson's correlation did not alter substantially in relation to exercise time.

The overall within-subjects SD between methods was ± 0.08°C (95% CI, 0.06-0.15). This is an acceptable amount of error between methods, because meaningful differences when comparing Tcore during separate exercise trials are likely to be considerably greater than this. The mean coefficient of variation between methods of measurement for all measurements combined was 0.29% (95% CI, 0.19-0.39) (Table 1). This statistic also shows that an excellent degree of random agreement exists between the measurement techniques. Limits of agreement are shown graphically for all measurements in Figure 2. The width of the limits is consistent for the data across the whole measurement range, indicating an absence of proportional error.

Bland-Altman plot representing comparisons every 4 min (N = 190) between rectal and intestinal temperature (°C) during four blocks of the Loughborough intermittent shuttle running test (LIST) protocol. Bias and random error lines (95% limits of agreement) are included.

Participants were asked to eat the diet they would normally consume 48 h before taking part in strenuous activity/match play. The mean consumption of CHO, fat, and protein during this period was 7.4 ± 2 g·kg−1 BM·d−1, 1.1 ± 0.6 g·kg−1 BM·d−1, and 1.8 ± 1 g·kg−1 BM·d−1 for each macronutrient, respectively. No caffeine was detected in the dietary intake of the participants, and the osmolality of the first morning urine was < 900 mOsmol·kg−1 in all participants. Mean energy intake during the 48-h period before exercise was 13.6 ± 2.9 MJ·d−1.

Investigation B.

Mean heart rate was essentially the same between repeated trials. No statistical differences were detected between trials in mean heart rate before exercise or during each exercise block (F1,8 = 0.068; P = 0.802). Sprint performance also was similar between repeated tests; no differences were found in mean 15-s sprint times between trials (F1,8 = 0.555; P = 0.837).

The mean change between test and retest was small (0.01°C), not statistically significant (F1,8 = 0.766; P = 0.407), and did not alter throughout the measurement range studied (Fig. 3). Basal intestinal temperature was 37.18 ± 0.26°C and 37.15 ± 0.28°C in trials 1 and 2, respectively. At the end of exercise, Tint in trials 1 and 2 was 38.43 ± 0.38°C and 38.46 ± 0.37°C, respectively.

Mean intestinal temperature (mean ± SD; °C) during repeated trials of the Loughborough intermittent shuttle running test (LIST) protocol.

Pearson's correlation coefficients show a high correlation between repeated trials in each 15-min period of exercise (Table 2). The relationship between all measurements combined is also strong (r = 0.97, P < 0.01; 95% CI, 0.95-0.99). Intraclass correlation coefficients also indicated excellent reliability; the ICC for all comparisons combined was 0.99 (95% CI, 0.96-1.02).

Statistical comparisons of intestinal temperature (°C) between repeated trials, during each 15-min block of the Loughborough intermittent shuttle running test (LIST) protocol.

The overall within-subject SD was ± 0.08°C (95% CI, 0.05-0.15). The coefficient of variation for each 15-min block of exercise is shown in Table 2; the overall CV for all measurements was 0.17% (95% CI, 0.09-0.25). Limits of agreement for repeated trials are represented graphically in Figure 4. It can be seen that random errors are scattered in parallel over the measurement range, and there is no evidence for the presence of proportional error. The limits of agreement indicate that in 95% of cases, intestinal temperature differs between repeated trials by ± 0.23°C (95% CI, 0.14-0.32). All measures of absolute reliability reported in Table 2 exhibit a trend toward increased reliability with increased exercise duration.

Bland-Altman plot representing intestinal temperature (°C) comparisons made every LIST cycle (every 80 s) between repeated trials (N = 602). Mean difference and random error lines (95% limits of agreement) are included.

Environmental temperature (14.9 ± 0.7°C) and relative humidity (62.4 ± 7.9%) remained constant, and no differences were detected between trials or over time in these parameters. No trial order effects were detected in any of the reported measures.

Mean energy intake during the 48-h period before the first exercise trial was 13.5 ± 3.7 MJ·d−1, and the same diet was consumed before the second trial. The mean mass of CHO, fat, and protein consumed during the recording period was 7.2 ± 2.6 g·kg−1 BM·d−1, 1.0 ± 0.7 g·kg−1 BM·d−1, and 1.7 ± 0.4 g·kg−1 BM·d−1, respectively.


The main finding of this study is that ingestible temperature sensors agree acceptably well with Trec during dynamic free running in male athletes, provided that a consistent systematic bias of approximately 0.15°C is allowed for in the interpretation of measurements. Intestinal temperature measurements were also found to be acceptably reliable during intermittent exercise.

The results of investigation A show that a systematic bias between the two techniques was evident throughout the protocol. This difference is similar in direction and magnitude to that reported by Stephenson and coworkers (21). The consistently higher intestinal temperatures observed in the present study are also in agreement with other researchers that report higher Tint versus Trec at any given time (9,13,14,21). The differences in the relative response time of the two measures was negligible, and the decline in temperature after the three rest periods occurred at essentially the same time. The similar response time of the two instruments suggests that the bias seen during the intermittent running is possibly not attributable to a greater thermal inertia at the rectum but, rather, an absolute difference of 0.15°C between the advanced regions of colon and the rectum during exercise. The posture-related distribution of blood flow to metabolically active tissue may explain why response time was not different during this upright form of weight-bearing exercise compared with the previous work using cycling that has reported discrepancies.

The bias between methods of measurement declined slightly over time; that is, a total decline of 0.05°C was seen during the protocol. This reduction in bias is mirrored by an increase in the absolute reliability of comparison. This may suggest that any progression of the sensor along the gastrointestinal tract occurs early on in the exercise protocol and that the position of the sensor may become normalized between participants as the duration of exercise increases. Alternatively, this small decline in bias could be representative of a change in blood flow at these sites with exercise duration. It has been shown that as central venous pressure drops during prolonged exercise, a greater fraction of cardiac output is diverted away from the hepatic-splanchnic organs and the kidneys (19). Mesenteric ischemia may lead to Tint becoming less sensitive to small changes in blood temperature and, hence, more reflective of Trec. Another hypothesis is that changes in the amount of metabolically active tissue and blood flow to the legs could lead to increased venous return via the hemorrhoidal veins, increasing the sensitivity of Trec.

The ingestion period of 10 h seems to have enabled sufficient progress of the sensor through the gastrointestinal tract, with no differences between Tint and Trec apparent after the repeated ingestion of 4°C water. This is in agreement with previous researchers who have used ingestion times ranging from 6 to 12 h before measurement (9,14,17).

Considering that the temperature throughout the body core is generally not uniform, the differences observed in the present study suggest that Tint provides an acceptable level of agreement with Trec during serial measurements. It should be noted that the change in Tcore during the present study is less than would be expected during severe hyperthermia, and the agreement between Trec and Tint could be different through an extended temperature range. With this limitation of the current study understood, Tint seems to provide an accurate index of core temperature for use when measurement at other sites is not feasible.

In investigation B, exercise-induced changes in Tint were very similar between the repeated exercise trials. The magnitude of random error between the trials decreased notably during the first three blocks of exercise. To some degree, this reduction in error may be attributable to the relativity large amount of heat stored during exercise, which might have normalized any small differences in Tcore that had occurred between trials at the start of the protocol. However, considering that basal temperature was similar at the start of the protocol, it is likely that the majority of the error is attributable to movement of the sensor. One suggestion might be that the sensor typically resides within an upper portion of the large intestine before exercise. Keeling and Martin (11) showed that orocecal transit time was accelerated by 20-25% during treadmill walking, possibly because of rhythmic vibration of the abdomen. An increase in motility during the early stage of the exercise protocol might move the sensor until it reaches an area of the descending or sigmoid colon in which more compacted fecal material restricts movement. This suggestion seems consistent with the small reduction in systematic bias seen over time between Tint and Trec in investigation A, and it may be a more reliable explanation than those based around circulatory changes.

It is likely that a proportion of the biological variability between repeated trials occurred as a result of differing within-subject core temperature changes. The Tcore response when repeating a fixed-intensity exercise test may vary on separate occasions regardless of whether basal Tcore is standardized before exercise (10). Jette and colleagues (10) examined the reproducibility of moderate increases in Tcore during treadmill exercise in varying environmental temperatures. They report that whereas the change in Trec was reproducible when exercising at 20°C (increase in Trec of approximately 0.3°C), it was dissimilar when comparing trials at 40°C (increase in Trec of approximately 0.5°C; significant difference of 0.04°C between trials). Regardless of the reliability of the Tint technique, there will also be a certain degree of underlying error between trials that cannot be accounted for when comparing Tcore responses. However, this discrepancy between trials is likely to be small compared with the magnitude of differences that would be deemed meaningful when quantifying an intervention effect.


It is evident from this study that ingestible temperature sensors reproduce individual Tcore measurements in males during free running to an acceptable accuracy for examination of internal heat storage through repeated trials. The precision of this site of measurement seems to increase once a relatively steady-state core temperature has been achieved during exercise.


1. Altman, D. G. Practical Statistics for Medical Research. London: Chapman and Hall, pp. 60-62, 1991.
2. Atkinson, G., R. C. Davison, and A. M. Nevill. Performance characteristics of gas analysis systems: what we know and what we need to know. Int. J. Sports Med. 26(Suppl 1):S2-S10, 2005.
3. Atkinson, G., and A. M. Nevill. Statistical methods for assessing measurement error (reliability) in variables relevant to sports medicine. Sports Med. 26:217-238, 1998.
4. Bland, J. M., and D. G. Altman. Applying the right statistics: analyses of measurement studies. Ultrasound Obstet. Gynecol. 22:85-93, 2003.
5. Bland, J. M., and D. G. Altman. Calculating correlation coefficients with repeated observations: part 1- correlation within subjects. Brit. Med. J. 310:446, 1995.
6. Bland, J. M., and D. G. Altman. Measuring agreement in method comparison studies. Stat. Meth. Med. Res. 8:135-160, 1999.
7. Bland, J. M., and D. G. Altman. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1:307-310, 1986.
8. British Standards Institute. Clinical thermometers - part 3: performance of compact electrical thermometers (non-predictive and predictive) with maximum device. BS EN 12470-3:2000, 2000.
9. Edwards, B., J. Waterhouse, T. Reilly, and G. Atkinson. A comparison of the suitabilities of rectal, gut, and insulated axilla temperatures for measurement of the circadian rhythm of core temperature in field studies. Chronobiol. Int. 19:579-597, 2002.
10. Jette, M., J. Quenneville, J. Thoden, and S. D. Livingstone. Reproducibility of body temperature response to standardized test conditions when assessing clothing. Ergonomics 38:1057-1066, 1995.
11. Keeling, W. F., and B. J. Martin. Gastrointestinal transit during mild exercise. J Appl. Physiol. 63:978-981, 1987.
12. Kolka, M. A., L Levine, and L. A. Stephenson. Use of an ingestible telemetry sensor to measure core temperature under chemical protective clothing. J. Therm. Biol. 22:343-349, 1997.
13. Kolka, M. A., M. D. Quigley, L. A. Blanchard, D. A. Toyota, and L. A Stephenson. Validation of a temperature telemetry system during moderate and strenuous exercise. J. Therm. Biol. 18:203-210, 1993.
14. Lee, J. S. M. C., W. J. Williams, and S. M. Schneider. Core Temperature Measurement during Submaximal Exercise: Esophageal, Rectal, and Intestinal Temperatures. NASA center for AeroSpace Information, Report NASA/TP 2000-210133, 2000.
15. Nevill, A. M., and G. Atkinson. Assessing agreement between measurements recorded on a ratio scale in sports medicine and sports science. Br. J. Sports. Med. 31:314-318, 1997.
16. Nicholas, C. W., F. E. Nuttall, and C. Williams. The Loughborough intermittent shuttle test: A field test that simulates the activity pattern of soccer. J. Sports Sci. 18:97-104, 2000.
17. O'Brien, C., R. W. Hoyt, M. J. Buller, J. W. Castellani, and A. J. Young. Telemetry pill measurement of core temperature in humans during active heating and cooling. Med. Sci. Sports Exerc. 30:468-472, 1998.
18. Ramsbottom, R., J. Brewer, and C. Williams. A progressive shuttle run test to estimate maximal oxygen uptake. Br. J. Sports Med. 22:141-144, 1988.
19. Rowell, L. B., G. L. Brengelmann, J. R. Blackmon, R. D. Twiss, and F. Kusumi. Splanchnic blood flow and metabolism in heat-stressed man. J. Appl. Physiol. 24:475-484, 1968.
20. Sparling, P. B., T. K. Snow, and M. L. Millard-Stafford. Monitoring core temperature during exercise: ingestible sensor vs. rectal thermistor. Aviat. Space Environ. Med. 64:760-763, 1993.
21. Stephenson, A., M. D. Quigley, L. A. Blanchard, D. A. Toyota, and M. A. Kolka. Validation of Two Temperature Pill Telemetry Systems in Humans during Moderate and Strenuous Exercise. U.S. Army Research Institute of Environmental Medicine, Technical Report T10/92, 1992.
22. Vincent, J. Statistics in Kinesiology. Champaign, IL: Human Kinetics, pp. 103-106, 1994.


©2006The American College of Sports Medicine