Over the last decade, there has been a proliferation of commercially available HR monitors and wearable fitness devices. Targeting a larger audience than the elite athletes who use HR monitoring to inform their training and assess aerobic fitness, companies have entered the market of population health, offering a variety of wearable HR and activity monitoring systems to the public. Annual worldwide sales of such devices are projected to reach 100,000,000 units and $50 billion by 2019 (5,13,14).
Although many consumers purchase these wearable fitness trackers to catalog their HR response to exercise, others use them with the hope that they will improve health via weight loss and/or increased aerobic fitness (3,4,10,12,13). However, surveys document substantial attrition in the use of fitness wearables, with up to one-third of individuals discontinuing their use within 6 months of purchase (12).
In clinical practice, physicians and trainers frequently see patients who report physiologic and behavioral data obtained from their wearable devices; such data often include energy expenditure, steps taken, sleep/wake times, and HR. Controlled studies demonstrate variable accuracy of activity trackers, with error margins approaching 25% for some devices (3,7,10,12,17). A previous study suggests a somewhat lower error margin with selected HR monitors (19).
Current questions regarding the accuracy of wearable HR monitors are particularly relevant, given the recent consumer shift from HR monitors that rely on chest straps with electrodes that measure cardiac electrical activity toward more convenient wrist-worn monitors that use optical sensing technology similar to that used for pulse oximetry. Although the accuracy of chest strap monitors has been confirmed in various previous reports (6,18), there is a paucity of data validating the accuracy of wrist-worn, optically based HR monitors (12). A recent study from our group suggests that wrist-worn monitors fail to provide accurate readings during treadmill exercise; however, that study examined only one form of exercise (treadmill walking/running) and included first generation wrist-worn monitors, one of which is no longer commercially available (19). Broad assessment of the monitors' accuracy is important both for the individuals who rely on these monitors to guide their athletic, physical, and rehabilitative activity and for the physicians to whom these individuals report their HR readings for the purpose of potentially guiding therapy.
The objective of this study was to assess the accuracy of five commonly used, currently commercially available, optically based wearable HR monitors in an appropriately powered study under various forms of aerobic exercise conditions.
METHODS
Participants
This prospective study recruited 50 healthy adults 18 yr or older through fliers and Internet notices at Cleveland Clinic from June 2016 through August 2016. Participants were screened to ensure that they were able to safely perform an 18-min exercise protocol, including a treadmill, an elliptical trainer, and a stationary bicycle. The screening tool was adapted from the National Academy of Sports Medicine's screening questionnaire (11). Study exclusion criteria included known cardiovascular or lung disease, presence of a cardiac pacemaker, treatment with beta-blockers or heart rhythm medications, and self-reported chest pain, dizziness, or loss of balance. The protocol was approved by the Institutional Review Board of the Cleveland Clinic, and all subjects provided written informed consent. The study was registered at clinicaltrials.gov (NCT02818244).
HR Monitors
All participants wore standard ECG leads (Mason-Likar electrode placement of torso-mounted limb leads), a Polar H7 chest strap monitor, and a Scosche Rhythm+ on the forearm. In addition, each participant was randomly assigned by a computer program to wear two different wrist-worn HR monitors, one on each wrist; this enabled the assessment of each type of wrist-worn monitor in 25 subjects. The wrist-worn monitors assessed included Fitbit Blaze (Fitbit), Apple Watch (Apple), Garmin Forerunner 235 (Garmin), and TomTom Spark Cardio (TomTom). Four units of each type of monitor were purchased from retail outlets and studied in random order. Each of these optically based wearable monitors measures HR via an optically obtained plethysmogram that is processed according to proprietary algorithms.
Exercise Protocol
In each subject, HR was assessed using five different monitoring modalities (ECG, Polar H7 chest strap, Scosche Rhythm+, and two different wrist-worn monitors). The Mason–Likar electrode placement allowed the assessment of modified leads I, II, and III on ECG. An aggressive electrode preparation was performed at each site, which included cleansing with alcohol and light abrasion to reduce resistance and optimizing signal quality. ECG was monitored on a Quinton Q-tel RMS telemetry system, and hard copy rhythm strips were obtained to measure HR. ECG-based HR was determined by visual assessment under direct supervision by a cardiologist. In addition, the HR was measured when performing four different types of exercise at varying intensities; these included treadmill, stationary bicycle, elliptical trainer with arm levers, and elliptical trainer without arm levers. The order of exercises was assigned randomly.
Exercise protocols for each piece of equipment were as follows:
- Treadmill
- ○2 mph for 1.5 min
- ○3.5 mph for 1.5 min
- ○6 mph for 1.5 min
- Stationary bicycle
- ○25 W for 1.5 min
- ○55 W for 1.5 min
- ○125 W for 1.5 min
- Elliptical (without arm levers)
- ○Light for 1.5 min: crossramp = 1, resistance = 1, cadence = 60–70 min−1
- ○Moderate for 1.5 min: crossramp = 1, resistance = 5, cadence = 90–100 min−1
- ○Vigorous for 1.5 min: crossramp = 10, resistance = 10, cadence = 90–100 min−1
- Elliptical (with arm levers)
- ○Light for 1.5 min: crossramp = 1, resistance = 1, cadence = 60–70 min−1
- ○Moderate for 1.5 min: crossramp = 1, resistance = 5, cadence = 90–100 min−1
- ○Vigorous for 1.5 min: crossramp = 10, resistance = 10, cadence = 90–100 min−1
The treadmill settings of 2, 3.5, and 6 mph correspond to workloads of 2.5, 3.7, and 10.2 METs, respectively. For a 70-kg individual, the bicycle settings of 25, 55, and 125 W correspond to 2.4, 3.7, and 8.8 METs, respectively. Because there are no standard workload settings for elliptical trainers, we identified three settings that were judged to represent light, moderate, and vigorous activity.
Each subject spent 4.5 min at each of the four exercise stations and then rested for 2 min between different exercise stations; therefore, total exercise time was 18 min, and total time of each trial was 24 min. HR signals for all devices were checked at the beginning of each exercise/rest segment to ensure device function. HR was recorded from HR monitors at the completion of each 1.5 min exercise segment and at the end of each 2 min rest period; preliminary studies in three subjects confirmed that HR had reached a steady state at these time points. At each time point, HR was recorded by two trained research personnel (SMG and ME), one situated on each side of the subject. HR recordings from all devices and ECG were obtained for a period of approximately 5 s. Values were entered into an IRB-approved database.
Statistical Methods
Sample size
Sample size was based on the use of Lin's concordance correlation coefficient (rc) to compare HR measurements with wearable, optically based HR monitors to those obtained with the ECG, which is considered the standard (10). On the basis of previous work, we deemed an rc > 0.8 to represent acceptable accuracy in HR measurement (20). Generation of 25 pairs of data for each device (i.e., device and ECG) was necessary to provide 90% power to determine a difference from rc of 0.82 to rc of 0.93.
Analysis plan
Paired differences
Using the ECG-determined HR as the standard, each of the HR monitoring systems was assessed for accuracy by calculation of the difference between the measures and compared. The paired differences, both relative and absolute, were calculated as (HRecg – HRdevice) for each device under the various conditions. The absolute percent differences were calculated as ([HRecg – HRdevice]/HRecg × 100).
Agreement
The Bland–Altman analysis was performed to assess agreement for each device with ECG (2). In addition, Lin's concordance correlation coefficients (rc) and associated 95% confidence intervals were calculated to provide a measure of agreement for each device with ECG. The concordance correlation coefficient (rc) measures the degree to which the paired observations fall on the identity line (9).
Multivariable testing
Repeated-measures mixed model ANOVA was used to test the overall effect of the fitness devices while adjusting for covariates and taking into account multiple measurements for each subject. In addition to HR device and exercise condition (activity type and intensity), factors in the final adjusted model included age, gender, body mass index, wrist size, and days of typical aerobic exercise per week.
Data were analyzed using SAS version 9.4 (SAS Institute Inc., Cary, NC) and R software version 3.2.3 (15).
Presentation
Continuous variables are reported as mean ± SD, with median and percentile values. Categorical variables are reported as percent and frequency.
RESULTS
Subjects
The study randomized 50 subjects (mean ± SD age = 38 ± 12 yr, 27 [54%] females, 6 [12%] non-Whites) (Table 1, Fig. 1). Subjects were examined for the presence of tattoos on the wrist; none had tattoos in this location. All subjects engaged in regular aerobic exercise (including walking), and 82% reported that they exercised regularly to the point of perspiration. Subjects' mean ± SD resting HR on ECG was 86 ± 18 bpm.
FIGURE 1: Flow of participants through the study.
Aggregate results
Of the 4000 possible HR measurements, 3985 were recorded (99.6%). Across all ECG tracings, there was minimal artifact and in no situation did ECG artifact interfere with visual HR determination. Missing data were attributable to failure of the device to record HR (eight for Apple Watch, four for Fitbit, two for Scosche Rhythm+, and one for Garmin Forerunner 235.
Measured HR ranged from 51 to 184 bpm. Average differences from the ECG standard were less than 1 bpm for the Polar H7 under all exercise conditions but extended to nearly 20 bpm for other monitors (Table 2). The average differences from the ECG standard were calculated as both relative error (which averages positive and negative differences from the ECG standard) and the absolute value of error, regardless of direction. HR values on the wrist-worn monitors varied from the ECG standard by approximately 2% to nearly 20%, depending on the monitor and the activity (Table 2).
Bland–Altman analysis revealed that all monitors had some measurements that did not reflect HR accurately (Fig. 2); however, this variation was not linked to specific HR values, meaning that variability was not influenced by the HR magnitude. The Apple Watch had 95% of differences fall within −17 and 20 bpm of the ECG, whereas TomTom Spark Cardio and Garmin Forerunner 235 had 95% of values fall within −24 and 31 bpm and −27 and 33 bpm, respectively. The corresponding values for Scosche Rhythm+ and Fitbit Blaze were −31 and 38 bpm and −30 and 45 bpm, respectively.
FIGURE 2: Bland–Altman plots and 95% limits of agreement with electrocardiographically measured HR. A, Polar H7; B, Apple Watch; C, TomTom Spark; D, Garmin Forerunner 235; E, Scosche Rhythm+; F, Fitbit Blaze.
Under all conditions combined, when compared with ECG, the Polar chest strap had the highest agreement with ECG with an rc of 0.99. Among wrist-worn monitors, the Apple Watch performed best with rc = 0.92, followed by the TomTom Spark Cardio (rc = 0.83) and Garmin Forerunner 235 (rc = 0.81). The Scosche Rhythm+ and Fitbit Blaze had rc = 0.75 and rc = 0.67, respectively (Fig. 3).
FIGURE 3: Concordance correlation coefficients depicting agreement of device-measured HR with ECG. A, Polar H7; B, Apple Watch; C, TomTom Spark; D, Garmin Forerunner 235; E, Scosche Rhythm+; F, Fitbit Blaze.
TABLE 1: Baseline characteristics of participants.
TABLE 2: HR monitor differences from ECG according to activity.
The results of the mixed model confirmed that among the optically based HR monitors, the Apple Watch was the most accurate, with no statistical difference from ECG (P = 0.22), even after adjustment for all other factors. The other optically based HR monitors often underestimated the true HR (P < 0.0001). Subject factors (age, gender, body mass index, wrist circumference, and days of typical aerobic exercise per week) were not associated with HR monitor accuracy.
Agreement with ECG during various types of exercise
The Polar H7 Chest Strap performed well during all different aerobic exercise modalities (rc = 0.99), but other HR monitors' agreement with ECG varied with the type of exercise (Table 2). At rest, all monitors had rc > 0.88. With the treadmill, all devices provided acceptable agreement (rc = 0.88–0.93) except the Fitbit Blaze (rc = 0.76). While biking, Garmin Forerunner 235, Apple Watch, and Scosche Rhythm+ had the highest agreement with ECG (rc > 0.80). On the elliptical trainer without using the arm levers, only the Apple Watch provided readings that agreed with the ECG (rc = 0.94). None of the optically based HR monitors provided good agreement with ECG during elliptical trainer use with the arm levers engaged (rc < 0.80).
Although HR monitor agreement with ECG varied with the type of exercise, it did not vary with the intensity of exercise from easy to moderate on each piece of equipment. However, when moving to vigorous exercise, only the Apple Watch had readings with similar agreement to ECG to those obtained with less intense exercise; all other monitors had less agreement during vigorous activity (P < 0.003).
DISCUSSION
The results of this study demonstrate that optically based wearable HR monitors are less accurate than electrode-containing chest strap monitors. In addition, the accuracy of these monitors varies with the type of aerobic activity. These findings raise questions concerning the role of such monitors in individuals' management of their health, assessment of their fitness, and guidance of their fitness regimens.
Introduced in the 1980s, chest strap–based HR monitors function much like an ECG, sensing cardiac electrical activity. Several studies confirm the accuracy of most of these HR monitors under conditions of both rest and moderate exercise (6,8,18). Although chest strap–based HR monitors have been favored by elite athletes because of their proven accuracy, they are relatively inconvenient and have not been widely adopted by the public. By contrast, the recent introduction of convenient, wrist-worn HR monitors that include the capability for wireless transmission has stirred widespread public interest in HR monitoring. However, as reported by major media outlets, individuals' experiences with the newer class of HR monitors suggest that their accuracy may be poor, particularly during exercise (16). This controversy has reached the courtroom in the form of a class action lawsuit alleging that the Fitbit device is inaccurate and potentially harmful (14).
The new wrist-worn HR monitors do not measure cardiac electrical activity; rather, they rely on photoplethysmography. The monitor illuminates the skin with an LED and then measures the amount of light reflected back to a photodiode sensor; this enables detection of variations in blood volume associated with the pulse of blood caused by each cardiac contraction. Potential sources of error with optically based monitors include motion artifact from physical movement, misalignment between the skin and the optical sensor, variations in skin color/tone, ambient light, and poor tissue perfusion (1). The accuracy of such monitors during exercise is controversial, some studies suggesting that wrist-worn HR monitors perform best at rest or slow walking, and others asserting assert good accuracy even during vigorous exercise (1,5). In a recent study examining subjects on a treadmill, we found variable accuracy between different optically based HR monitors; however, when compared with an ECG, the tested monitors all had a concordance correlation coefficient exceeding 0.80 (19).
Extending that work, the current study assessed the performance of wearable HR monitors using varying aerobic exercise modalities (treadmill, stationary bicycle, and elliptical trainer with and without arms) and at different levels of intensity. Recognizing that people engage in a variety of types of exercise beyond walking on a treadmill, the primary purpose of the current study was to assess the monitors' agreement with ECG during different forms of aerobic activity. Distinct from the previous study, the current study enrolled a new cadre of subjects and assessed several monitors that had not previously been tested. Although all monitors performed well in subjects at rest, their accuracy varied with different exercise modalities. Certain monitors were better suited for the stationary bicycle and the elliptical trainer (without arm motion), and this may be a result of variable tolerances for motion artifact associated with different exercises. In particular, none of the optical monitors performed well when assessing HR in subjects using the elliptical trainer with arm motion, likely a result of motion artifact related to arm movement (1). By contrast, a chest strap containing an electrically based monitor provided accurate measurements, regardless of exercise intensity or modality.
Although this study is the largest of its kind and included nearly 4000 HR measurements, it has limitations. The current study methodology (e.g., visual recording of HR on ECG) may have contributed to some error as compared with a more rigorous approach wherein time stamped raw device data were extracted. The results apply only to the HR monitors tested. These monitors were chosen because of their apparent popularity with the public, and each monitor was the manufacturer's most recent offering at the time of the study; however, they represent an opportunistic sample of the wide range of available HR monitors. Continuous HR assessment, which is currently not feasible with all devices, would enable more detailed comparisons. The devices were assessed in young, healthy volunteers exercising in a laboratory setting. Results may vary for different subsets of individuals, including cardiac patients. Although we accounted for participant factors including age and BMI, the relatively narrow distribution of age and BMI in this study of young, healthy volunteers does not enable us to rule out a potential effect of these factors on the accuracy of HR measurement. In addition, these results may not be representative of those obtained during more vigorous exercise or during different activities (e.g., running on pavement, swimming, or other sports participation).
CONCLUSION
This study demonstrates that optically based wrist-worn HR monitors vary in their accuracy and that their accuracy is activity dependent. Individuals who use such monitors should be aware of the possibility of inaccurate measurements and that some monitors (i.e., the Apple Watch) provide greater agreement with ECG than do other monitors. Apparently, spurious HR measurements should be confirmed by simple palpation to measure HR or, if readily available, by ECG. When accurate HR monitoring is essential, an electrically based chest strap monitor should be used.
This study was supported by the Mary Elizabeth Holdsworth Fund at the Cleveland Clinic.
The Mary Elizabeth Holdsworth Fund had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.
There are no relevant conflicts of interest to disclose.
The results of the present study do not constitute endorsement by the American College of Sports Medicine. The results of the study are presented clearly, honestly, and without fabrication, falsification, or inappropriate data manipulation.
REFERENCES
1. Alzahrani A, Hu S, Azorin-Peris V, et al. A multi-channel opto-electronic sensor to accurately monitor heart rate against motion artefact during exercise.
Sensors (Basel). 2015;15(10):25681–702.
2. Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement.
Lancet. 1986;1(8476):307–10.
3. Case MA, Burwick HA, Volpp KG, Patel MS.
Accuracy of smartphone applications and wearable devices for tracking physical activity data.
JAMA. 2015;313(6):625–6.
4. Diaz KM, Krupka DJ, Chang MJ, et al. Fitbit®: an accurate and reliable device for wireless physical activity tracking.
Int J Cardiol. 2015;185:138–40.
5. El-Amrawy F, Nounou MI. Are currently available wearable devices for activity tracking and heart rate monitoring accurate, precise, and medically beneficial?
Healthc Inform Res. 2015;21(4):315–20.
6. Laukkanen RM, Virtanen PK. Heart rate monitors: state of the art.
J Sports Sci. 1998;16(Suppl):S3–7.
7. Lee JM, Kim Y, Welk GJ. Validity of consumer-based physical activity monitors.
Med Sci Sports Exerc. 2014;46(9):1840–8.
8. Léger L, Thivierge M. Heart rate monitors: validity, stability, and functionality.
Phys Sportsmed. 1988;16(5):143–51.
9. Lin LI. A concordance correlation coefficient to evaluate reproducibility.
Biometrics. 1989;45(1):255–68.
10. Murakami H, Kawakami R, Nakae S, et al.
Accuracy of wearable devices for estimating total energy expenditure: comparison with metabolic chamber and doubly labeled water method.
JAMA Intern Med;176(5):702–3.
11. National Academy of Sports Medicine Data Collection Sheet [cited 2016 April 5]. Available from: http://
http://www.nasm.org/docs/pdf/nasm:par-q-(pdf-21k).pdf.
12. Patel MS, Asch DA, Volpp KG. Wearable devices as facilitators, not drivers, of health behavior change.
JAMA. 2015;313(5):459–60.
13. Piwek L, Ellis DA, Andrews S, Joinson A. The rise of consumer health wearables: promises and barriers.
PLoS Med;13(2):e1001953.
14. Profils S. Do wristband heart trackers actually work? A checkup [cited 2016 April 4]. Available from: http://
http://www.cnet.com/news/how-accurate-are-wristband-heart-rate-monitors/.
15. R package epiR for calculating concordance correlation coefficients [cited 2016 April 4]. Available from: http://cran.r-project.org/web/packages/epiR.
16. Stern J. Fitness bands with heart-rate tracking are missing a beat.
Wall Street Journal. December 16, 2014.
17. Swan M. Emerging patient-driven health care models: an examination of health social networks, consumer personalized medicine and quantified self-tracking.
Int J Environ Res Public Health. 2009;6:492–525.
18. Terbizan DJ, Dolezal BA, Albano C. Validity of seven commercially available heart rate monitors.
Measurement in Physical Education and Exercise Science. 2002;6(4):243–7.
19. Wang R, Blackburn G, Desai M, et al.
Accuracy of wrist-worn heart rate monitors.
JAMA Cardio. 2016;2(1):104–6.