Long work hours, altered schedules, and on-call periods are common practice for most physicians in residency and throughout their careers. These factors result in sleep loss, disruption of circadian rhythm, and fatigue for medical personnel who provide around-the-clock health care. There is an ongoing debate in the medical literature, in medical schools, in residency training programs, and in clinical practice about the effects of sleepiness on a physician's performance. Thus far, research findings and published literature reviews of fatigue and sleepiness in medical personnel have not established a consensus on the effects or levels of fatigue or sleepiness,1,2,3,4,5 but these studies conflict with findings from other operational settings. For example, accident investigations in other demanding, complex, high-risk environments have identified fatigue as a probable cause of or a contributing factor to many well-known accidents.6,7 Findings related to increased errors, incidents, and accidents have consistently shown that sleepiness can reduce operational safety, as shown in extensive literature establishing the risk of injury or death related to drowsy driving.8,9 This literature includes data from physicians who put themselves and others at increased risk when they drive after working extended hours.10,11
One factor that may contribute to this risk is the discrepancy between subjective perceptions of sleepiness and an individual's physiologic state. Individuals may report adequate alertness and subsequently engage in an activity (e.g., working, driving) when they are physiologically at levels of sleepiness associated with clinical sleep disorders.12
With these issues in mind, we defined four hypotheses at the beginning of our study:
- Physiologic sleepiness of physicians is increased due to acute and chronic sleep loss resulting from their work schedules.
- A 24-hour period of being in-hospital and on-call would further increase physiologic sleepiness.
- Additional sleep would reverse these physiologic effects and increase alertness.
- Individuals' subjective abilities to determine their levels of physiologic sleepiness are poor.
We tested these hypotheses by quantifying the levels of physiologic and subjective sleepiness in a group of healthy young residents in different conditions of prior work and sleep.
In 1996, we recruited 16 healthy resident anesthesiologists from the Stanford University School of Medicine. Eleven of the 16 (age 30.3 ± 2.3 years, age range 27-35, four women, seven men) completed all three experimental conditions. We chose resident anesthesiologists for this study because they do not work the day following an on-call night, providing an opportunity for laboratory evaluation; and anesthesiology is the clinical discipline of the principal investigator, which provided an opportunity to cover clinical responsibilities while the residents participated in the sleep-extension condition.
After giving informed consent to the protocol approved by the combined Stanford University, VA Palo Alto Health Care System Institutional Review Board, the residents enrolled in our study. We used the sleep laboratory to evaluate physiologic sleepiness in three conditions for each subject using a within-subjects, repeated-measures design. In the baseline condition (BL), the residents were on a general operating room rotation involving approximately five on-call periods per month and with no on-call period in the preceding 48 hours. In the post-call condition (PC), we studied the residents starting the morning after an in-hospital, 24-hour duty shift on a clinically demanding rotation (e.g., obstetric anesthesia). During this on-call period, the residents were allowed to sleep if they had the opportunity. In the sleep-extended condition (EXT), we instructed the residents to maximize their nocturnal sleep for four consecutive days and to report to work at 10:00 AM each morning. Although we attempted to balance the order of these conditions, the order was primarily determined by the residents' clinical schedules. We paid the residents in our study for their participation ($1,400 for completing all three conditions).
Upon enrollment in the study, the residents completed the Sleep Disorders Questionnaire (SDQ) to identify their risks for significant clinical sleep disorders.13 They completed a sleep-and-activity log, which included sleep onset and offset, awakenings, sleep quality, and time spent at work, and they also wore a wrist activity monitor (“actigraph”-AMA-32™) on the nondominant wrist for the week prior to each study day.14 We downloaded actigraph data to a computer and edited using the daily sleep log to account for times when the actigraph was removed (e.g., while the resident was taking a shower, which would have been scored as “sleep,” if not corrected). We used Action3™ software to score actigraph sleep and wake periods.
All the residents in our study were precluded from using caffeine after 10:00 P.M. the night before each study day, were nonsmokers, and were not taking any chronic medications. In the BL and EXT conditions only, they underwent overnight polysomnography (PSG) in a sleep laboratory that included standard measurements of four channels of the electroencephalogram (EEG) (C3, C4, O1, O2), left and right electrooculogram (EOG), and submental electromyogram (EMG).15 They maintained their normal bedtimes in the sleep laboratory and were allowed to awaken spontaneously if sleep time was less than eight hours. In the BL condition, an investigator awakened the resident, if necessary, at eight hours. In the EXT condition, residents were not awakened at eight hours but were allowed to extend their sleep, consistent with the goal of this condition.
Each study day took place between approximately 9:30 AM and 7:00 PM, with a 30-minute lunch break at 1:00 PM. During the study day, the residents were not allowed to accrue sleep and were constantly observed by an investigator.
Multiple Sleep Latency Test (MSLT)
The MSLT is the criterion standard sleep laboratory test of daytime sleepiness that consists of five opportunities (sleep-latency test sessions) to fall asleep, at two-hour intervals beginning at 10:00 AM.15 During each sleep-latency test session, the resident was instructed to lie quietly with eyes closed and to attempt to fall asleep; the lights were then turned out. Sleep latency is determined from the time of lights out until the EEG shows evidence of sleep (standard criteria are 90 seconds of stage 1 sleep or 30 seconds of any other sleep stage). If sleep occurred, the resident was awakened so that he or she would not accrue any sleep. If no sleep had occurred within 20 minutes, the test was terminated and a latency time of 20 was assigned. The “MSLT score” is the average of all sleep-latency values over the course of the day. Three residents in the PC condition were still on clinical duty for the 10:00 AM test session and were studied from 12:00 noon until 6:00 PM, for a total of four sessions.
Long sleep-latency values are associated with increased alertness, while short latency values indicate increased sleepiness. Normal MSLT values are longer than ten minutes, while pathologic levels of daytime sleepiness are defined as less than five minutes, a level of daytime sleepiness seen in patients with obstructive sleep apnea, those with narcolepsy, and healthy individuals who have been awake for 24 consecutive hours.12, 15
Subjective Sleepiness and Perception of Sleep Onset
Prior to each MSLT, the residents provided a rating on the Standford Sleepiness Scale (SSS)16 to determine their subjective sleepiness. The SSS uses a seven-point descriptive scale (1 = “feeling active and vital; wide awake” to 7 = “almost in reverie; sleep onset soon; lost struggle to remain awake”).
To examine the perception of sleep onset, we asked the residents whether they had stayed awake or had fallen asleep during each particular MSLT and gave them no time cues in the sleep laboratory to aid their responses.
Where appropriate, we performed nested repeated-measures analyses of variance (RM-ANOVA) using the within factors of sleep condition and time of day. In the absence of any missing cells, RM-ANOVA was performed using a multivariate approach and significance levels were corrected for correlation of the observations by the Greenhouse—Geisser estimate of epsilon.17 Three of the 11 residents in the PC condition arrived after 10:00 A.M. and missed the 10:00 A.M. MSLT and SSS administration. For these data (MSLT, SSS), we performed a univariate RM-ANOVA instead of the multivariate RM-ANOVA, because in the presence of any missing data the multivariate, but not the univariate, RM-ANOVA procedure eliminates an individual's entire set of observations.17
We made post-hoc comparisons between groups using contrasts17 when the overall ANOVA was significant. Bonferroni's correction was used to partition the total type I error for the three contrasts at each time point (corrected p threshold = .016). We performed Fisher's exact test on the 2 × 2 table of the residents' perception of sleep onset versus sleep onset as defined by EEG.
Data values in the following section of this report are presented as mean ± standard deviation. Statistical significance was considered to be p < .05. We used Statview 4.1™ and Super-ANOVA software to perform the statistical analysis.
Due to scheduling conflicts, only 11 of the 16 residents we recruited completed all three sleep conditions (of the five remaining residents, two completed one condition and three completed two conditions). To allow the full repeated-measures treatment, we performed the analyses in this study on data from these 11 residents only. The results, however, were not appreciably different when data from the other five residents were included (using nonrepeated-measures ANOVA).
The average time needed to complete all three conditions was 145 ± 64 days (range = 62–255 days). Although our intent was to balance the order of the conditions, we were not able to achieve this balance. For the 11 residents who completed all three conditions, the first condition was balanced (4 BL, 4 PC, 3 EXT), but the second (3 BL, 6 PC, 2 EXT) and third conditions (4 BL, 1 PC, 6 EXT) were not.
Sleep Disorders Questionnaire and Overnight PSG
No resident showed evidence of a clinical sleep disorder as indicated by the SDQ. Overnight PSG, sleep log, and actigraphic data for all conditions are summarized in Table 1. We found no significant difference between the BL and EXT conditions in sleep pattern (PSG sleep stage) or efficiency (total sleep time/time in bed). The total sleep time (TST) during the PSG for the BL condition (7.7 ± 0.6 hr) was similar to what the residents reported sleeping at home (sleep-log data = 7.8 ± 1.1 hr) and to the TST recorded by the actigraph (actigraph data = 7.4 ± 1.5 hr), indicating that the residents' sleep quantity in the laboratory was comparable to their sleep quantity at home. As we hypothesized, the PSG total sleep time was greater in the EXT condition (8.4 ± 0.8 hr, p = .05) than in the BL condition (7.7 ± 0.6 hr).
Actigraph vs. Self-reported Sleep-log Data
We found no significant difference between sleep-log estimates and actigraph measurements of prior-night TST overall or interacting with conditions (see Table 1, F = .44, p = .52 for main effect, F = 2.1, p = .16 for interaction). Sleep logs were good predictors of previous, four-night TST as measured by the actigraph (R2 = .987, p = .0001). We used the sleep-log data for the TST in the analyses presented here.
Rapid-eye-movement sleep did not occur during any of the MSLT sleep periods. Sleep latency versus time of day and condition is shown in Figure 1. We found significant main effects for both condition (F = 28.7, p = .0001) and time of day (F = 5.59, p = .0003). Sleep latencies were significantly longer in the EXT condition compared with both the BL and the PC conditions, except at 2:00 P.M., when EXT sleep latency was significantly longer than that in the PC condition (p = .006), but not the BL condition (p = .25). We found no statistically significant difference in MSLT scores between the BL and PC conditions at any time point.
The aggregate MSLT scores were: BL 6.7 ± 5.3 min; PC 4.9 ± 4.7 min; EXT 12.0 ± 6.4 min. Post-hoc contrasts between conditions revealed significant differences between the EXT and BL conditions (p = .0002) and EXT and PC conditions (p = .0002), but no difference between the BL and PC conditions (p = .14).
MSLT vs. TST
A regression plot comparing the average of the residents' previous fournights' TST with the MSLT score is shown in Figure 2. Prior sleep was a significant predictor of MSLT score, although this factor explained only 30% of the MSLT score variance (R = .54, R2 = .30, p = .0001). Figure 2 also shows the sleep latencies of residents who had short sleep latencies in both the BL and the EXT conditions. In fact, one resident had short sleep latencies in all conditions (BL = 2.7 min, PC = 4 min, EXT = 3.9 min), with average four-nights' TST of BL = 6.6 hr, PC = 7 hr, EXT = 9.6 hr.
SSS and Perception of Sleep Onset
The SSS data in Figure 3 show a significant effect of condition (F = 41.8, p = .0001) but not overall by time of day (F = 1.1, p = .36). SSS values were significantly greater in the PC condition than for the other two conditions.
Our further evaluation of subjective data, in Figure 4, revealed a modest correlation between the SSS scores and the MSLT (F = 48.2, p = .0001), which accounted for 23% of the variance (R = 0.48, R2 = .23). SSS was a significant predictor of MSLT score in the BL (F = 8.9, p = .005) and the PC conditions (F = 24.3, p = .0001) but not in the EXT condition (F = 2.517, p = .12).
The residents demonstrated poor ability to discriminate the onset of sleep, with subjective reports discrepant from EEG determinants Overall, the residents did not report sleep in 49% of the episodes identified as physiologic sleep according to standard EEG measures (68 of 140 episodes). When they reported they had stayed awake, they were wrong 76% of the time (68 of 90 episodes).
Our study demonstrated that anesthesiology residents working their normal schedules had a significant degree of daytime sleepiness approaching the level seen in patients with narcolepsy or sleep apnea (MSLT scores of less than five minutes). In the post-call condition, the residents' daytime sleepiness exceeded this level.12 Our study also demonstrated that sleep—the obvious intervention—was an effective countermeasure, since it returned the residents' MSLT scores to the normal range.
The residents' baseline level of sleepiness did not differ significantly from that seen in the post-call condition. These findings suggest that the residents' normal schedules resulted in chronic sleep loss and measurable sleepiness during the day. The fact that residents are sleepy during their “routine” work periods indicates that they are not truly rested. This finding has critical implications for the interpretation of all other studies that compare measures obtained during non-call “rested” work periods with measures during on-call or post-call conditions.1,2,3,4,5
Although in the post-call condition the residents were on rotations reputed to have “difficult” call nights, they managed to accrue nearly as much sleep while on call as they did at home, which was explained by several factors described in the residents' sleep logs and debriefings: (1) they slept whenever possible during the on-call period, including times of day that they would not usually sleep; (2) some on-call rotations (e.g., ICU and OB anesthesia) required frequent awakenings, but each awake period was relatively short; (3) on-call nights varied markedly in their difficulty; and (4) post-call residents without clinical duty the following day did not need to awaken early to prepare for work. Thus, it is likely that clinicians whose on-call periods are busy continuously and allow little or no sleep will have levels of daytime sleepiness even greater than those documented in our study.
Although the standard deviations for the MSLT data in our experiment (4.7–6.4 min) were similar to those found in other experiments,18 the individual differences were striking. One resident had low MSLT scores in all conditions. Harrison describes such unusual individuals as having “high sleepability and no sleepiness,” in that they can fall asleep easily during the daytime even though there is no physiologic need.19 This resident had no sleep complaints and tested normally on the SDQ.
Consistent with previous studies, the residents in our study were generally inaccurate at determining their level of sleepiness. Subjective reports of sleepiness (SSS score) were poor predictors of the levels of physiologic sleepiness (MSLT scores). Furthermore, the residents did not perceive reliably when they had, in fact, fallen asleep according to EEG criteria.20 The inability of individuals to subjectively determine the true level of sleepiness could be important operationally in many complex work environments, including hospitals, not to mention the potential risks to providers themselves and to others if the providers drive home after call.
Additionally, interns and residents are supposed to be learning while they work. Previous studies have demonstrated the negative impact of fatigue on mood as well as on decision making, memory, and speech.1,2,3,4,5 While more research is needed on the effect of sleep deprivation on learning, it is unlikely that physiologic sleepiness makes learning easier. Nor is fatigue likely to result in the most sensitive, compassionate, and effective interactions between doctors and their patients.
One limitation of our study is its small sample size, although we did use a within-subjects design. Our small sample size was the result of difficulty recruiting already busy professionals. Another important caveat is that the residents in our study were young and healthy. Similar work schedules of older clinicians or of those with chronic diseases may have greater effects. Future studies should address this issue by including clinicians of different ages and health status.
Sleepiness is a powerful physiologic drive that can be overwhelming, regardless of the setting or the individual's motivation and skill. The findings from our study raise concerns about the physiologic state of residents, even during normal work schedules. We have two provocative questions: (1) How often do physicians perceive themselves as alert and proceed to work, when, in fact, they are extremely sleepy physiologically?; and (2) How often do they have spontaneous uncontrolled sleep episodes during clinical work without perceiving it, as has been documented in other settings?21,22
We do not know the answers to these questions. We do know that throughout the industrialized world, in virtually all transportation industries where falling asleep on the job can translate into death or injuries, work hours are regulated to mitigate the effects of sleep loss, fatigue, and disruption of circadian rhythm.7 In these industries, accidents caused by fatigue are not considered isolated occurrences, but are, rather, the expected result of human physiologic limitations. For example, the National Transportation Safety Board (NTSB) examines the sleep and wake history of any transportation crew member involved in an accident, and fatigue has been identified formally as a contributing factor and probable cause of transportation accidents.23 In fact, reducing crew fatigue has been on the NTSB's list of “most wanted transportation safety improvements” for nearly a decade.24
In contrast, within health care in the United States, there has been little formal control of work hours, schedules, and rest requirements. Lacking such regulation, the primary oversight of these issues, at least for residents, rests with the Accreditation Council for Graduate Medical Education (ACGME), which sets standards for residency training.25 Until recently, ACGME's standards on work and duty hours have varied between medical specialties, and for most specialties, the standards have been relatively weak. In June 2002, ACGME adopted a new set of common program requirements for work, duty, and rest applicable to all specialties, effective July 2003.26 Although the new standards are more stringent than their predecessors for nearly all specialties, they are much more lenient than those in other hazardous industries with similar safety concerns.7 Other countries have adopted stringent limits on work hours and fatigue for medical trainees.27 Ironically, the professional sleep societies, sleep disorders clinicians, and sleep researchers have been emphatic about addressing operational risks related to sleepiness in many parts of our society (e.g., drowsy driving), but not within health care itself.
Generally, the de facto burden of proof has been placed on clinicians and investigators to prove that working long duty shifts (e.g., 24–36 hours), irregular on-call schedules, or more than 80–100 hours per week is not safe. There are case examples in which sleep loss and physician fatigue have been identified as an important contributing factor to medical error (e.g., the “Libby Zion case”28 and the “Verbrugge case”22). However, these cases have been treated as “isolated individual lapses,”29 rather than as signs of a widespread and serious risk. Our study showed that even moderate clinical work schedules caused physiologic changes in physicians equivalent to conditions in patients with serious clinical pathology. Given the significant potential risk to the safety of patients and providers, meaningful reform is justified. The new ACGME requirements are a positive step but they may not be sufficient to fully mitigate the risk.
1. Asken MJ, Raham DC. Resident performance and sleep deprivation: a review. J Med Educ. 1983;58:382–8.
2. Samkoff JS, Jacques CH. A review of studies concerning effects of sleep deprivation and fatigue on residents' performance. Acad Med. 1991;66:687–93.
3. Leung L, Becker CE. Sleep deprivation and house staff performance. Update 1984–1991. J Occup Med. 1992;34:1153–60.
4. Owens JA. Sleep loss and fatigue in medical training. Curr Opin Pulm Med. 2001;7:411–8.
5. Weinger MB, Ancoli-Israel S. Sleep deprivation and clinical performance. JAMA. 2002;287:955–7.
6. Mitler MM, Carskadon MA, Czeisler CA, Dement WC, Dinges DF, Graeber RC. Catastrophes, sleep, and public policy: consensus report. Sleep. 1988;11:100–9.
7. Mitler MM, Dement WC, Dinges DF. Sleep medicine, public policy, and public health. In: Kryger MH, Roth T, Dement WC (eds). Principles and Practice of Sleep Medicine. 3rd ed. Philadelphia, PA: W. B. Saunders, 2000:580–8.
8. Horne J, Reyner L. Sleep related vehicle accidents. BMJ. 1995;310:565–7.
9. NCSDR/NHTSA Expert Panel on Driver Fatigue and Sleepiness: Drowsy Driving and Automobile Crashes. Washington, DC: National Center for Sleep Disorders Research and National Highway Traffic Safety Administration, 1997.
10. Marcus CL, Loughlin GM. Effect of sleep deprivation on driving safety in house staff. Sleep. 1996;19:763–6.
11. Steele MT, Ma OJ, Watson WA, Thomas HA Jr, Muelleman RL. The occupational risk of motor vehicle collisions for emergency medicine residents. Acad Emerg Med. 1999;6:1050–3.
12. Roehrs TA, Carskadon MA, Dement WC, Roth T. Daytime sleepiness and alertness. In: Kryger MH, Roth T, Dement WC (eds). Principles and Practice of Sleep Medicine. 3rd ed. Philadelphia, PA: W. B. Saunders, 2000:43–52.
13. Douglass AB, Bornstein R, Nino-Murcia G, et al. The Sleep Disorders Questionnaire. I: Creation and multivariate structure of SDQ. Sleep. 1994;17:160–7.
14. Sadeh A, Hauri P, Kripke D, Lavie P. Role of actigraphy in the evaluation of sleep disorders. Sleep. 1995;18:288–302.
15. Carskadon MA, Dement WC, Mitler MM, Roth T, Westbrook PR, Keenan S. Guidelines for the Multiple Sleep Latency Test (MSLT): a standard measure of sleepiness. Sleep. 1986;9:519–24.
16. Hoddes E, Dement W, Zarcone V. The development and use of the Stanford Sleepiness Scale (SSS). Psychophysiology. 1972;9:150.
17. Gagnon J, Roth JM, Carroll M, et al. SuperANOVA: Accessible General Linear Modeling. Berkeley, CA: Abacus Concepts, 1989.
18. Levine B, Roehrs T, Zorick F, Roth T. Daytime sleepiness in young adults. Sleep. 1988;11:39–46.
19. Harrison Y, Horne JA. “High sleepability without sleepiness.” The ability to fall asleep rapidly without other signs of sleepiness. Neurophysiol Clin. 1996;26:15–20.
20. Carskadon MA, Rechtschaffen A. Monitoring and staging human sleep. In: Kryger MH, Roth T, Dement WC (eds). Principles and Practice of Sleep Medicine. 3rd ed. Philadelphia, PA: W. B. Saunders, 2000:1197–217.
21. Akerstedt T, Torsvall L, Gillberg M. Sleepiness and shift work: field studies. Sleep. 1982;5 suppl 2:S95–106.
22. Pankratz H. Witness: doctor dozed. Denver Post. 1995 Sept 15,1A.
23. Rosekind MR, Gregory KB, Miller DL, Co EL, Lebacqz JV. Aircraft Accident Report: Uncontrolled Collision with Terrain, American International Airways Flight 808, Douglas DC-8, N814CK, U.S. Naval Air Station, Guantanamo Bay, Cuba, August 18, 1993. (Report #NTSB/AAR-94/04). Washington, DC: National Transportation Safety Board, 1994.
24. National Transportation Safety Board. “Most wanted” transportation safety improvements. 〈http://www.ntsb.gov/recs/mostwanted/current_list.htm#Current
〉. Accessed 6/27/02. Washington, DC: NTSB, 2002.
25. Accreditation Council for Graduate Medical Education. Residency review committee program requirements. 〈http://www.acgme.org/Review_archive/index.asp
〉. Accessed 6/27/02. Washington, DC: ACGME, 2002.
26. Accreditation Council for Graduate Medical Education. ACGME approves new proposed common requirements for resident duty hours. 〈http://www.acgme.org/New/residentHours602.asp
〉. Accessed 6/26/02. Washington, DC: ACGME, 2002.
27. Bulstrode CJ, Muir Gray AJ, Anderson M, Hawke CI. New deal for junior doctors' hours: how to achieve it. BMJ. 1992;305:1203–5.
28. Asch DA, Parker RM. The Libby Zion case. One step forward or two steps backward? N Engl J Med. 1988;318:771–5.
29. Glickman RM. House-staff training—the need for careful reform. N Engl J Med. 1988;318:780–2.