Journal Logo


Validity, Reliability, and Inertia of Four Different Temperature Capsule Systems


Author Information
Medicine & Science in Sports & Exercise: January 2018 - Volume 50 - Issue 1 - p 169-175
doi: 10.1249/MSS.0000000000001403


Major sport events are increasingly organized in extreme environmental conditions, making it more important for athletes to perform well in hot and cold ambient conditions and to monitor their core body temperature (Tc) from a safety perspective. Exercise-induced increases in metabolic heat production (1,17) are known to induce a major physiological challenge to the thermoregulatory system (15,17). A disbalance between heat production and heat loss causes the Tc to rise, which may lead to the development of exertional hyperthermia (Tc > 40°C), heat-related illnesses (i.e., heat exhaustion/heat stroke), and/or a reduction of athletic performance (1,14,20). Alternatively, exercise in cold environments could lead to rapid heat loss due to conduction (water), convection (wind), and radiation, which may contribute to the development of hypothermia (8). Hence, accurate assessment of an athlete’s Tc is important to assess the presence and magnitude of thermoregulatory strain and to select and apply appropriate cooling or heating techniques for preservation of health and exercise performance (5,6,10).

The gastrointestinal temperature, measured with ingestible temperature capsules, has been established as a valid surrogate marker for Tc (9,11,13). Temperature capsule systems are wireless, relatively noninvasive, and easily applicable in field-based conditions. Although the validity of these temperature capsule systems has been examined (7,11,23), different study designs were applied and a substantial variation in accuracy was found (i.e., −0.001°C to 0.27°C). Hence, it is essential to determine which capsule system is superior for assessment of Tc in field conditions.

The aim of this study was to examine the validity, reliability, and inertia characteristics of four commercially available ingestible telemetric temperature capsule systems (i.e., CorTemp, e-Celsius, myTemp, and VitalSense) in well-controlled ex vivo circumstances using a water bath. Data from this study provide insight into which telemetric capsule system has the most favorable characteristics for Tc assessment, which could enable researchers and trainers to select the best temperature sensor for their scientific study and/or daily practice.


Experimental design

Four different ingestible telemetric temperature capsule systems (CorTemp, e-Celsius, myTemp, and VitalSense) were tested in a custom-made accurately controlled water bath. The primary outcomes were the validity, test–retest reliability, and inertia characteristics of the capsule systems. A total of 10 temperature capsules from a single production batch of each telemetry system were tested during three separate trials. The first and second trials consisted of a similar study protocol and were used to assess the validity and test–retest reliability. The third trial adopted a different protocol and was used to examine the inertia characteristics of the temperature capsules. To reduce any bias caused by environmental factors and to ensure that the capsule systems were evaluated in comparable circumstances, a single temperature capsule for each capsule system was used simultaneously in each trial.

Experimental setup

An overview of the experimental setup is presented in Supplemental Figure 1 (see Figure, Supplemental Digital Content 1, Overview of the experimental setup, A thermostat-controlled and distilled water–filled bath (3.5 L) was used in which four highly sensitive and calibrated wired temperature probes (1529 Chube E-4 Thermometer Readout Thermistor; Fluke Hart Scientific, Everett, WA) measured temperature up to 0.00035°C exactly. The average value of these wired temperature sensors represented the temperature of the water bath. In addition, a heater (Fluke Hart Scientific 2100 Temperature Controller) and stirrer (Heidolph Instruments D91126, type RZR1, Schwabach, Germany) system ensured thermal homogeneity of the water bath. A custom-made holder prevented the sensor reaching the bottom of the water bath or coming into contact with another sensor. The external monitors of each of the telemetric capsule systems were placed around the water bath within a distance range of 0.2 m.

Study protocol

Before each experiment, the sensors and external monitors were synchronized to ensure that the measurements occurred simultaneously. In the validity and reliability measurements, the water bath temperature gradually increased from 33°C to 44°C, exceeding the physiological range between hypothermia (<35°C) and exertional hyperthermia (>40°C). An automated protocol was programmed to induce a stepwise increase in water bath temperature, resulting in 12 temperature plateaus (33°C, 34°C, 35°C, 36°C, 37°C, 38°C, 39°C, 40°C, 41°C, 42°C, 43°C, and 44°C). For each temperature plateau, three conditions had to be achieved before the protocol could proceed: 1) water bath temperature did not vary (>0.02°C) during 50 consecutive measurements (5 min), 2) the average value of the four independent probes did not vary (>0.01°C) during two consecutive measurements, and 3) the change in heater power did not exceed 8% during two consecutive measurements. These conditions ensured stability of the water bath temperature and thereby reliable temperature measurements at each point of measurement. The study protocol was performed twice for each temperature capsule (trial 1/trial 2), which allowed us to calculate the validity and test–retest reliability. The water bath temperature was measured every 6 s.

In the inertia experiment, the water bath temperature gradually increased from 36°C to 42°C. At every temperature threshold (36°C, 37°C, 38°C, 39°C, 40°C, 41°C, and 42°C), the water bath temperature was stabilized for 5 min. Then the water bath temperature increased by 1°C in a time frame of 5 min. This time frame was constructed to mimic the increase in Tc during high-intensity exercise in hot ambient conditions, if no heat can be removed from the body (1). This study protocol allowed us to calculate the time delay of the temperature measured by the temperature capsule compared with the actual temperature of the water bath during the stepwise heating phase. This time delay is defined as the inertia of the temperature capsule.

Telemetric temperature capsule systems

Characteristics of the ingestible telemetric temperature capsule systems are shown in Table 1. All capsule systems used an external wireless recorder to receive the signal from the temperature capsule via a specific radio frequency. The temperature capsules of CorTemp (HQ Inc., Palmetto, FL), e-Celsius (BodyCap, Caen, France), and VitalSense (Philips Respironics, Bend, OR) were delivered in standby modus and had to be activated before use. The myTemp (Nijmegen, the Netherlands) capsule is automatically activated by the external recorder, which is also the power supply for the temperature capsule. All temperature capsules were activated directly before trial 1. Furthermore, all measurements were performed in accordance with the manual of the individual capsule systems, and the highest sample frequency was used throughout the protocol. The external recorders of all capsule systems stored the data, which were exported to a computer for further analysis using the latest version of available software.

Physical and technical characteristics of the telemetric capsule systems.

Data processing and statistical analysis

The average capsule temperature during the final 150 s of each temperature plateau was calculated per telemetric system. Because of differences in sample rate, capsule temperature reflected the average of n = 25 consecutive measurements for myTemp, n = 15 for CorTemp, n = 6 for e-Celsius, and n = 10 for VitalSense. Average capsule temperature and water bath temperature were compared for each temperature plateau (33°C–44°C). Outliers were defined as observations with a difference of >1°C between consecutive measurements and were excluded from further analysis. Furthermore, we addressed the number of measurements with a difference between consecutive data points between 0.2°C and 1.0°C to get more insight into the consistency of the data.

To establish the validity, the Bland–Altman method for assessing the agreement between two methods was used (3). In short, the mean difference (=systematic bias) between the temperature capsule and water bath was assessed using a one-sample t-test. The systematic bias and accompanying 95% limits of agreement (LOA) were derived from the Bland–Altman plot (3). Furthermore, the intraclass correlation coefficient (ICC) was calculated for the average of all 10 capsules, to determine the intermeasure agreement (19). The SEM was calculated on the basis of the SD of the difference between temperature capsules and water bath temperature (16). Furthermore, we conducted a repeated-measures ANOVA to determine whether the accuracy of the capsule systems was different across temperature plateaus (i.e., 33°C–44°C). Differences in accuracy across capsule systems were examined using one-way ANOVA. A similar approach was used to determine the test–retest reliability.

Inertia was assessed as the time delay of the telemetric capsule to reach the same temperature as that of the water bath after a sudden temperature increase. Inertia was determined at 50% (P50) and 90% (P90) of the increase to each temperature plateau, and the time at which the first observation of the capsule and the water bath exceeded the P50 or P90 temperature was taken. Subsequently, the time to reach P50 and P90 of the capsule system was compared with the time of the water bath to reach P50 and P90 and was defined as the time delay (inertia). Because the time delay may be influenced by the accuracy and sample frequency of the capsule, we applied two different correction methods: 1) the systematic bias of the telemetric capsule (i.e., sensitivity data) was subtracted from the recorded values, and 2) temperature data were interpolated between subsequent samples to determine the exact time at which P50 and P90 were exceeded. Inertia characteristics were presented as follows: I) raw data, II) corrected for differences in accuracy, and III) corrected for differences in accuracy and sample frequency. To examine whether there was an inertia difference per temperature plateau across telemetric capsule systems, a two-way repeated-measures ANOVA was performed. One-way ANOVA was used to assess the differences in inertia characteristics at P50 and P90 between the four telemetric capsule systems. Furthermore, time constants of the systems response were determined by exposing a single capsule three times to a step change in temperature between two water baths of 7°C (30°C–37°C). Differences in the systems sampling rates did not allow for a very precise determination; however, by using interpolation of the data, the time constants can be determined.

All statistical analyses were performed using SPSS Statistics (Version 20), in which the level of significance was set at P < 0.05. The systematic bias was reported as mean difference ± SD, unless indicated otherwise.


Missing data and outliers

A total of 40 temperature capsules were investigated: 10 sensors per telemetric capsule system. We experienced difficulties with the activation of n = 4 VitalSense telemetric capsules, although the provided instructions were carefully followed. Moreover, n = 1 of these VitalSense temperature capsules could not be activated at all and one temperature capsule stopped measuring after 43°C during trial 2, meaning that data of the 44°C temperature plateau of 44°C are not reported for that temperature capsule. As a result, data from 39 temperature capsules were used for our analyses.

In n = 6 from n = 9 VitalSense temperature capsules, data were randomly missed throughout the protocol (trial 1 + 2), representing 1.0% of the total data. A sample of two CorTemp capsules and n = 1 e-Celsius capsule randomly missed 0.1% of the data, whereas no missing data were reported for the myTemp system (see Table, Supplemental Digital Content 2, Missing data and outliers, The CorTemp system seemed to be the only system with outliers (ΔTcapsule > 1°C), which was randomly present in 4.0% of the total data, ranging from a difference of 1°C to 62.1°C. CorTemp also showed error measurements (0.2°C < ΔTcapsule < 1°C) in 4.4% of the total data, whereas these error measurements were not present in the other systems. Outliers and error measurements were both found in all CorTemp capsules.


After exclusion of outliers, mean differences between capsule and water bath temperature for trial 1 were 0.077°C ± 0.040°C (CorTemp), −0.081°C ± 0.055°C (e-Celsius), −0.003°C ± 0.006°C (myTemp), and −0.017°C ± 0.023°C (VitalSense; Fig. 1), which were significantly different from zero (all P values ≤ 0.01). In addition, the myTemp system demonstrated the smallest mean difference, followed by VitalSense, CorTemp, and e-Celsius (Pcapsule system < 0.001). The 95% LOA values were ±0.079°C (CorTemp), ±0.108°C (e-Celsius), ±0.013°C (myTemp), and ±0.046°C (VitalSense). The SEM was 0.028°C for CorTemp, 0.039°C for e-Celsius, 0.005°C for myTemp, and 0.017°C for the VitalSense system. All capsule systems demonstrated an excellent agreement between capsule and water bath temperature on the basis of the significant ICC of 1.00 (all P values < 0.05). The data of trial 2 revealed similar outcomes with respect to the mean differences, LOA, SEM, and ICC (Table 2). A repeated-measures ANOVA indicated that the mean difference between the e-Celsius, myTemp, and VitalSense system and water bath temperature did not drift across temperature plateaus (P < 0.05). In contrast, a significant decrease in mean difference was found across increasing water bath temperatures for the CorTemp system (P = 0.002; Fig. 2).

Raw data (A) and data after outlier removal (B) mean difference between temperature capsule and water bath temperature for the capsule systems. Data were presented as mean difference ± LOA. *Indicates a significant systematic bias.
Validity of the four temperature capsule systems.
An overview of the mean difference between capsule and water bath temperature for the 12 discrete temperature plateaus. A separate line was plotted for each temperature capsule system. Data were presented as mean difference ± SD. *Represents a drifted response over the temperature plateaus.

Test–retest reliability

Mean difference between trial 1 and trial 2 seemed to be significantly different from zero for CorTemp (0.017°C ± 0.083°C; LOA, ±0.162°C; P = 0.030) and e-Celsius (−0.007°C ± 0.033°C; LOA, ±0.064°C; P = 0.019; Fig. 3). For myTemp (0.0001°C ± 0.008°C; LOA, ±0.016°C) and VitalSense (0.002°C ± 0.014°C; LOA, ±0.028°C), the mean difference did not differ significantly from zero (both P values > 0.05). Furthermore, the CorTemp system demonstrated the highest mean difference between trial 1 and trial 2 (P = 0.001), whereas the other systems had a comparable mean difference between both trials (P > 0.05). The SEM was 0.058°C for CorTemp, 0.023°C for e-Celsius, 0.006°C for myTemp, and 0.010°C for the VitalSense system. An excellent agreement between trial 1 and trial 2 was found for all four capsule systems (ICC, 1.00; P < 0.05).

Raw data (A) and data after outlier removal (B) mean difference between temperatures measured during trial 1 and trial 2 for the capsule systems. Data were presented as mean difference ± LOA. *Indicates a significant systematic bias.


Inertia characteristics are summarized in Table 3. The raw data revealed that the CorTemp system had a significant lower time delay to reach P50 (9 ± 5 s) and P90 (10 ± 5 s) compared with the other capsule systems, whereas the VitalSense system demonstrated the slowest response (P50, 54 ± 12 s; P90, 35 ± 3 s; P < 0.001). After correction for the systematic bias of each capsule system, the myTemp system demonstrated the lowest P50 and P90, followed by the CorTemp and e-Celsius system. The P50 and P90 remained the highest for the VitalSense system (P < 0.001). Additional correction for sample frequency did not alter inertia characteristics (Table 3). Time constants of the systems response were 22 s for myTemp, 28 s for e-Celsius, 47 s for CorTemp, and 48 s for VitalSense.

Inertia characteristics of the four temperature capsule systems.


This is the first study to compare the validity, reliability, and inertia characteristics of all commercially available ingestible telemetric temperature capsule systems. Our well-controlled ex vivo water bath study demonstrates that all temperature capsule systems are valid and reliable to measure (water) temperature, evidenced by their small systematic biases and a low LOA and SEM after removal of outliers (CorTemp). Furthermore, we found that the CorTemp, e-Celsius, and myTemp capsule system demonstrated comparable inertia characteristics, whereas the VitalSense system demonstrated a lower responsiveness to changes in water bath temperature. These findings enable researchers and clinicians to select the telemetric capsule system that best suits their goal, which can improve the safety aspect of doing exercise in a hot and cold environment.

An excellent validity and reliability of a temperature measurement technique is characterized by a 1) low systematic bias (<0.1°C), 2) narrow 95% LOA (maximal ±0.4°C), 3) high ICC (>0.80) with the reference temperature, and 4) low SEM (2,9,23). We found a significant systematic bias for all four capsule systems, but the validity and reliability of every capsule system complied with reference criteria for an excellent acceptable level of agreement. Nevertheless, we observed a substantial prevelence of outliers in our raw CorTemp data (4.0%), leading to a high LOA (2.3°C) and violation of accuracy criteria (<0.1°C). Data verification and cleaning are, therefore, needed before CorTemp data can be used appropriately. Furthermore, the decreasing systematic bias with increasing temperatures suggests that the CorTemp system is mainly accurate in normothermic and hyperthermic conditions (36°C–44°C), but less accurate for hypothermic conditions (33°C–35°C). Although, the CorTemp system did not met the criteria for an excellent validity for hypothermic conditions, the systematic bias (0.1°C–0.2°C) is still physiologically acceptable. e-Celsius, myTemp, and VitalSense were more constant and performed well across the whole temperature range. Furthermore, the ICC and the SEM were used to assess the reliability (2,16). An ICC of 1.00 was found for all capsule systems, whereas an ICC of >0.80 is typically considered as acceptable, with higher values representing a better reliability (2). The high ICC of the four capsule systems suggests that the error variance between water bath and capsule temperature and between trial 1 and trial 2 are negligible compared with the normal variance of the measurement (12). In addition, the low SEM for all capsule systems is another indication that there is an excellent agreement between water bath and capsule temperature and between trial 1 and trial 2. Therefore, all capsule systems are valid and reliable methods to measure temperature after outliers have been removed.

The responsiveness of the temperature capsules was quantified by the inertia characteristics at P50 and P90. We found that the VitalSense system had the slowest response (38–39 s) to acute changes in temperature compared with the other systems (range, 18–26 s). Nevertheless, all systems demonstrated an acceptable responsiveness to changes in temperature. A previous study reported a maximal Tc increase of 1°C per 5 min if no heat can be removed from the body (1). An inertia of 18 to 39 s is, therefore, physiologically irrelevant. Moreover, the underestimation of Tc measured with a temperature capsule in dynamic and/or quick changing situations is marginal and hardly influences final Tc. Furthermore, the order of the results of the time constants matches the results of the P50 and P90 times corrected for sample frequency. The observed time constants are considered appropriate for the physiological signals measured.

Although the results of our study may be promising, practical considerations must be taken into account. First, the activation of the VitalSense temperature capsules was hard and one of the capsules (10%) could not be activated at all. Anecdotal evidence from our research groups and our collaborators confirms the infrequent nonactivation problem of VitalSense capsules in other studies, whereas similar problems were occasionally experienced for CorTemp capsules. The sample frequency is also an important distinction between the capsule systems, because the sample frequency can be adjusted for CorTemp and myTemp, whereas it is fixed and relatively low frequent for e-Celsius and VitalSense. Furthermore, 4% of the raw CorTemp data consisted of outliers (>1°C) and another 4.4% of error measurements (0.2°C–1.0°C). The CorTemp system is therefore less consistent, and the use of the raw data with large intervals between measurements might result in inaccurate values. Finally, the present study used capsules from a single production batch from each capsule system, which limited us to assess batch differences within capsule systems.

For human use, other aspects aside from the investigated accuracy, test–retest reliability, and inertia, also play a role. Tc is the result of the local thermal balance affected by tissue properties and local blood flow (21). Studies comparing different measurement location in the digestive system showed that absolute temperatures and inertia differ between locations (18,22). Moreover, the esophageal temperature is ~0.2°C lower during moderate-intensity exercise compared with both the gastrointestinal and rectal temperatures (18). In addition, the response time of the esophageal temperature is faster than the gastrointestinal temperature, which in turn was faster than the rectal temperature (18). Ideally, the capsule should be located in the gastrointestinal tract and not in the stomach, which can be achieved by timely swallowing the capsule (4,13).

In conclusion, significant but small differences were observed across telemetric temperature capsule systems. CorTemp demonstrated outliers and error measurements in 4.0% of the recorded data, whereas this was virtually absent in all other systems. Nevertheless, an excellent validity and test–retest reliability was found for all systems after removal of outliers. The best test–retest reliability was found for the myTemp and VitalSense system, whereas CorTemp and e-Celsius demonstrated a small, but negligible, systematic difference between trial 1 and trial 2. Furthermore, the VitalSense system showed the slowest response to increases in water bath temperature, whereas the other systems had a comparable time delay.

The authors want to thank Jasmijn Faber for her excellent help during the study.

This study was supported by a Sportinnovator grant (ZonMw, 2015).

The work of T. M. H. E. is supported by a European Commission Horizon 2020 grant (Marie-Sklodowska-Curie Fellowship 655502). For the remaining authors, no conflicts of interest were declared. The results of present study do not constitute endorsement by the American College of Sports Medicine. Furthermore, the results of the study are presented clearly, honestly, and without fabrication, falsification, or inappropriate data manipulation.


1. American College of Sports Medicine; Armstrong LE, Casa DJ, Millard-Stafford M, Moran DS, Pyne SW, Roberts WO. American College of Sports Medicine Position Stand. Exertional heat illness during training and competition. Med Sci Sports Exerc. 2007;39(3):556–72.
2. Atkinson G, Nevill AM. Statistical methods for assessing measurement error (reliability) in variables relevant to sports medicine. Sports Med. 1998;26(4):217–38.
3. Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986;1(8476):307–10.
4. Bongers CC, Hopman MT, Eijsvogels TM. Using an ingestible telemetric temperature pill to assess gastrointestinal temperature during exercise. J Vis Exp. 2015;104.
5. Bongers CC, Hopman MT, Eijsvogels TM. Cooling interventions for athletes: an overview of effectiveness, physiological mechanisms, and practical considerations. Temperature (Austin). 2017;4(1):60–78.
6. Bongers CC, Thijssen DH, Veltmeijer MT, Hopman MT, Eijsvogels TM. Precooling and percooling (cooling during exercise) both improve performance in the heat: a meta-analytical review. Br J Sports Med. 2015;49(6):377–84.
7. Bongers CCWG, Hopman MTE, Eijsvogels TMH. Validity and reliability of the myTemp ingestible temperature capsule. J Sci Med Sport. 2017. pii: S1440-2440(17)30453-X.
8. Burtscher M, Kofler P, Gatterer H, et al. Effects of lightweight outdoor clothing on the prevention of hypothermia during low-intensity exercise in the cold. Clin J Sport Med. 2012;22(6):505–7.
9. Byrne C, Lim CL. The ingestible telemetric body core temperature sensor: a review of validity and exercise applications. Br J Sports Med. 2007;41(3):126–33.
10. Casa DJ, DeMartini JK, Bergeron MF, et al. National Athletic Trainers’ Association Position Statement: exertional heat illnesses. J Athl Train. 2015;50(9):986–1000.
11. Challis GG, Kolb JC. Agreement between an ingestible telemetric sensor system and a mercury thermometer before and after linear regression correction. Clin J Sport Med. 2010;20(1): 53–7.
12. de Vet HC, Terwee CB, Mokkink LB, Knol DL. Measurement in Medicine. 3rd ed. New York: Cambridge University Press; 2011.
13. Gant N, Atkinson G, Williams C. The validity and reliability of intestinal temperature during intermittent running. Med Sci Sports Exerc. 2006;38(11):1926–31.
14. Gonzalez-Alonso J, Teller C, Andersen SL, Jensen FB, Hyldig T, Nielsen B. Influence of body temperature on the development of fatigue during prolonged exercise in the heat. J Appl Physiol (1985). 1999;86(3):1032–9.
15. Hargreaves M. Physiological limits to exercise performance in the heat. J Sci Med Sport. 2008;11(1):66–71.
16. Hopkins WG. Measures of reliability in sports medicine and science. Sports Med. 2000;30(1):1–15.
17. Kenefick RW, Cheuvront SN, Sawka MN. Thermoregulatory function during the marathon. Sports Med. 2007;37(4–5):312–5.
18. Mundel T, Carter JM, Wilkinson DM, Jones DA. A comparison of rectal, oesophageal and gastro-intestinal tract temperatures during moderate-intensity cycling in temperate and hot conditions. Clin Physiol Funct Imaging. 2016;36(1):11–6.
19. Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychol Bull. 1979;86(2):420–8.
20. Tatterson AJ, Hahn AG, Martin DT, Febbraio MA. Effects of heat stress on physiological responses and exercise performance in elite cyclists. J Sci Med Sport. 2000;3(2):186–93.
21. Taylor NA, Tipton MJ, Kenny GP. Considerations for the measurement of core, skin and mean body temperatures. J Therm Biol. 2014;46:72–101.
22. Teunissen LP, de Haan A, de Koning JJ, Daanen HA. Telemetry pill versus rectal and esophageal temperature during extreme rates of exercise-induced core temperature change. Physiol Meas. 2012;33(6):915–24.
23. Travers GJ, Nichols DS, Farooq A, Racinais S, Periard JD. Validation of an ingestible temperature data logging and telemetry system during exercise in the heat. Temperature (Austin). 2016;3(2):208–19.


Supplemental Digital Content

© 2018 American College of Sports Medicine