During surgical procedures, both under- and oversedation carry inherent risks; the former increases the likelihood of recall and agitation-induced sympathetic activation, and the latter, excessive depression of vital physiologic functions. Regional anesthesia is unique in that it allows the titration of the degree of recall and awareness independently of the degree of anesthesia. This has a potential sparing effect on the amount of sedative needed without compromising patient comfort; it decreases the risk of oversedation while shortening recovery time, with a significant effect on patient satisfaction and safety as well. Because of individual pharmacodynamic variability, a dose protocol cannot be used to set the level of sedation; instead, a patient’s resultant state after the administration of sedatives must be measured. To investigate the effects of differing degrees of intraoperative sedation during regional anesthesia on intra- and postoperative outcomes, a reliable and valid system for measuring the level of sedation is required.
There are three major categories of methods for assessing levels of sedation currently in use for adults: patient-based, observer-based, and machine-based methods. Machine-based methods are generally perceived to be the most objective assessments. Current options include the bispectral index (BIS), power spectral measure, and auditory evoked potentials (AEP). AEPs predict the response to verbal stimuli during general anesthesia (1,2), but it is unclear whether AEP latency and amplitude show graded changes as anesthesia lightens. Power spectral measure shows good correlation with drug concentrations and with increases in blood pressure or movements during general anesthesia (3), but at light anesthetic levels ambiguity increases because median and spectral edge frequencies can be the same whether the patient is awake or asleep. A number of studies have demonstrated the ability of the BIS to predict loss of consciousness and response to verbal commands (4,5). However, the correlation of BIS with more subtle gradations of sedation has not yet been determined; BIS readings are affected by the use of a regional anesthetic even when no other medications have been administered (6,7). Measurements of heart rate variability and respiratory sinus arrhythmia have also been proposed for use as the basis of a sedation score (8). Although machine-based assessments, in particular the BIS, offer a quantifiable means by which to measure sedation, their use is limited by their inability to discriminate among lesser degrees of sedation.
Patient-based assessment of sedation is frequently accomplished through the use of one or several 100-mm visual analog scales whose end points represent two extremes of sedation (e.g., “wide awake” to “extremely sleepy” or “as alert as I have ever been” to “I cannot keep awake”) (9,10). The patient is asked to mark the point representing his or her own perception of the degree of sedation. These scales are quick and easy to administer, but their reliability and between-patient applicability is limited, as is their feasibility at higher degrees of sedation.
Literature review of observer-based sedation scales for adult patients demonstrates a wide variety of scales currently in use. Many of the identified scales with documented validity are designed for use in intensive care units, predominantly with mechanically ventilated patients (11–18). It is important to distinguish scales intended specifically for use in intensive care units (ICUs) from those used to assess sedation during surgical procedures or in response to drug administration, because their primary aim is to assess calmness rather than level of consciousness (15). Furthermore, certain ICU scales, such as the Glasgow Coma Scale, are distorted by the use of sedatives (19), thus further limiting their use in surgical settings.
The most frequently cited observer-based scales for assessing sedation are the Ramsay sedation scale (20) and the Observer’s Assessment of Alertness/Sedation (OAA/S) (21). The six-category Ramsay scale, developed in the early 1970s, provides a simple, quick assessment of sedation; however, despite the widespread nature of its use, its reliability and validity have not been reported. A comparison of five sedation scoring systems by means of AEPs identified the Ramsay scale as having the best correlation with AEP (22); however, in 1994, Hansen-Flaschen et al. (23) identified numerous shortcomings of the Ramsay scale, including unclear definition of the sedation levels, lack of exclusivity among sedation levels, and its focus on assessing consciousness rather than sedation. The psychometric properties of the Ramsay scale have not been formally assessed.
The OAA/S is one of the few sedation scales whose reliability has been documented. However, to provide continuing measures of sedation, the OAA/S requires frequent stimulation; consequently, its usefulness is limited in surgical situations, because it could prove disruptive to both patient and surgeon (24). Although the OAA/S is reliable as a means of assessing level of alertness, it is not ideal for performing rapid, repeated assessments of a patient’s degree of sedation.
In their 1990 study comparing the sedative effects of propofol and midazolam during spinal anesthesia for orthopedic surgeries, Wilson et al. (25) used a categorical scale in which an observer rated the degree of sedation (Table 1). A variation on the Ramsay scale, the Wilson scale presents a simple means for assessing intraoperative sedation; however, there are no published data regarding the reliability and validity of this scale. The purpose of this study was to test the interobserver reliability of the scale proposed by Wilson et al. and to modify the scale as necessary to maximize reliability and feasibility in assessing sedation with regional anesthesia. The goal was to develop a valid and reliable method for assessing maintenance of a specified level of sedation for use in a randomized clinical trial comparing two different levels of intraoperative sedation with regard to a variety of clinical outcomes.
To assess existing sedation scales, a literature search was performed from January 1966 to May 2001 by using the MEDLINE, EMBASE, pre-MEDLINE, HealthSTAR, CINAHL, and CancerLIT databases. The following key terms were used:anesthesia and analgesia, hypnotics and sedatives, sedation, score, and sedation scale. Search returns were limited to human, adult, and English language citations. Relevant articles were identified and retrieved, and the sedation scales described were reviewed and classified by type. A total of 153 articles were reviewed: 42 used a visual analog scale; 32 used the original or modified form of the OAA/S, either alone as a measure of sedation or in comparison with BIS; 25 used a form of the Ramsay scale, alone or in comparison with other sedation scales; 4 used the Wilson sedation scale (25–28); and 40 used other categorical scales varying from three to seven categories that did not have specific scale names assigned. There was significant overlap between items on scales. In addition, 20 of these articles used scales designed for use in the ICU that were therefore not considered for this study; 16 used neuropsychological tests [e.g., digit symbol substitution test (29–31), critical flicker fusion (32–34), and reaction time (31,32,34,35)]. Relatively few had any documentation of their psychometric properties, and almost all of those with documentation were scales designed for use in the ICU setting (a complete list of literature search results is available from the authors upon request) (11–13,17,18,21,22,36–39).
Cases eligible for inclusion in this study were those using a regional anesthetic where two anesthesia providers were present at the same time in the operating room. More than 85% of the orthopedic surgical procedures performed at the Hospital for Special Surgery are performed with a regional anesthetic. A consecutive convenience sample was drawn from the cases in which an attending anesthesiologist had been paired with a resident, fellow, or certified registered nurse anesthetist (CRNA) or in which a second attending was available. Orthopedic surgeries using a broad range of regional anesthetics and sedatives were included. Approval was obtained from the hospital’s IRB. The assessment of the original Wilson sedation scale took place during August 1998. On the basis of the results, a revised scale was then tested from January to March 2001.
The sedation level of each patient was assessed once during the case, a minimum of 10 min after the administration of a regional block. Patient sedation level was assessed by asking the two anesthesia providers to rate sedation level simultaneously but independently as the study research assistant administered a standardized oral stimulus followed by a standardized physical stimulus. As the standardized oral stimulus, the patient was addressed by the research assistant: “[Name], please open your eyes.” The command was given in a normal speaking voice by the same research assistant throughout the study. For the standardized physical stimulus, if the patient did not respond to the spoken command, a quick, firm earlobe tug was applied to the right ear.
After the stimuli, the two anesthesia providers each independently rated their assessment of the patient’s sedation on the basis of the five-point Wilson scale (and subsequently the modified four-point scale). The raters were blinded to each other’s ratings.
The data were analyzed with EpiInfo, version 6.04 (http://www.cdc.gov/epiinfo/ei6.htm; Centers for Disease Control and Prevention, Atlanta, GA). Interrater reliability was assessed by using the unweighted κ statistic, which measures the concordance beyond chance between measurements of nominal data (40,41). When measuring observer agreement, the κ statistic is preferred, because it accounts for agreement occurring by chance, is a measure of concordance rather than trend, and can account for systematic observer bias (42). The value of κ can range from −1 (complete disagreement) to 0 (chance agreement) to +1 (perfect agreement). In deciding the level of significance of a κ value, the following guidelines have been suggested: <0 to 0.40, poor to fair; 0.41 to 0.60, moderate; 0.61 to 0.80, substantial; and 0.81 to 1.00, almost perfect (41).
Pairs of anesthesia care providers rated 100 different patients undergoing a variety of orthopedic procedures, including arthroplasties, arthrotomies, and arthroscopies. Analgesia was provided with epidural, spinal, combined spinal/epidural, interscalene, axillary, and other regional blocks. Midazolam was administered as a sedative in all but one case. In 29% of the cases, midazolam was the sole sedative, whereas in 71% of cases, midazolam was given in conjunction with one or more of the following: propofol, fentanyl, pentothal, diazepam, or droperidol. The patients ranged in age from 16 to 86 yr, with a mean age of 54 yr. Fifty-two percent were female. Among the 31 anesthesia care providers, 18 were attendings, 5 were fellows, 5 were residents, and 3 were CRNAs. Thirteen percent of the pairs were attending/attending, 29% were attending/fellow, 38 were attending/resident, and 20% were attending/CRNA.
Interrater percentage agreement on sedation scores with the original Wilson sedation scale was 79%, with a κ coefficient of 0.72 (P < 0.00001), signifying substantial agreement. The major source of disagreement was between scores of 2 (drowsy) versus 3 (eyes closed but rousable to command) (Table 2). When Categories 2 and 3 were merged to form a modified four-point Wilson sedation scale, the κ coefficient increased to 0.90, signifying excellent agreement. Analysis of interrater agreement as a function of the training of the second anesthesia care provider indicated no meaningful difference among the four possible combinations (Table 3).
During the follow-up data collection with this modified Wilson scale (Table 4), 50 different patients undergoing a similar variety of orthopedic procedures were rated by pairs of anesthesia care providers. Analgesic blocks and sedatives used were similar. Midazolam was used in 100% of cases, alone in 18% of cases and in conjunction with varying combinations of the sedatives listed previously in 82% of cases. The time to observation after the administration of the regional block ranged from 10 to 120 min, with a mean of 65 min. The patients ranged in age from 18 to 85 yr, with a mean age of 56 yr. Forty-eight percent were women. Among the 28 anesthesia care providers, 16 were attendings, 1 was a fellow, 7 were residents, and 4 were CRNAs. Thirty-six percent of the pairs were attending/attending, 4% were attending/fellow, 28% were attending/resident, and 32% were attending/CRNA.
Interrater percentage agreement on sedation scores with this modified scale was 84%, with a κ coefficient of 0.75 (P = 0.0000), signifying substantial agreement (Table 5). Analysis of interrater agreement as a function of the training of the second anesthesia care provider suggested lower concordance between attending/resident pairs or attending/CRNA pairs than between attending/attending pairs (Table 3).
A valid and reliable means of assessing and monitoring sedation during surgery with regional anesthesia would be valuable for both research and practice. This instrument must address different purposes and constraints than those used to assess sedation in the ICU. To compare outcomes of different levels of sedation during and after anesthesia, it is important to be able to measure and maintain a specific sedation level. However, because of the variability of individual reactions to sedatives, a dose protocol cannot be used to assess sedation. Instead, a tool for measuring a patient’s resultant state after the administration of sedatives is needed. Observer-based sedation scales offer a quick and easily administered means of assessing the level of sedation, provided that they demonstrate high interrater reliability and fulfill other criteria for construct validity.
This study of regional anesthesia patients documents the interrater reliability of Wilson’s original sedation scale to be fairly good for assessing light sedation, with the exception of poor discrimination between Categories 2 (drowsy) and 3 (eyes closed but rousable to command). Because the descriptions for Categories 2 and 3 do not describe mutually exclusive states (i.e., one can be drowsy with or without one’s eyes being closed), they do not fulfill this criterion for the construct validity of a scale (43,44).
To decrease the uncertainty associated with distinguishing between Categories 2 and 3, we modified the Wilson sedation scale by combining these two categories and by operationalizing the descriptions of each category with more specific criteria, as shown in Table 4. A preliminary statistical analysis using the original data suggested that the degree of agreement on the modified scale should be excellent. Data subsequently obtained with the modified Wilson scale, however, did not indicate an improvement in interrater agreement as measured by the κ coefficient. Several factors may explain this observation.
First, the modified scale has one fewer category, thus predisposing to an increased likelihood of agreement by chance alone, as was observed. Second, the modified-scale study had a smaller sample; thus, a smaller absolute number of disagreements had a greater effect on the ratio of observed versus expected concordance. Third, with both versions of the scale, there was one category that was never assigned simultaneously by both raters. When the original Wilson scale was used, there were no paired scores of 2 assigned by both raters. The 14 pairings of Scores 2 and 3 suggest that agreement was affected by raters’ difficulty in distinguishing these two categories. In contrast, raters never both assigned a score of 3 when using the modified Wilson scale. However, the presence of only two 3 and 4 pairings suggests not a lack of clear distinction between categories, but rather that moderate sedation was uncommon among the types of surgical procedures observed.
It is interesting to examine the differences in interrater agreement on the basis of observers’ levels of training or experience. When the original Wilson scale was used, no meaningful difference was seen among the three categories of anesthesia care provider pairings. (The pairing of an attending with a CRNA was excluded from this comparison, because of the great variability of duration of experience among the participating CRNAs.) It must be noted, however, that well over a third of the pairings (38%) were attending/resident. In contrast, when the modified Wilson scale was used, the distribution of observer pairings (with the exception of attending/fellow) was more balanced, and the expected decrease in concordance with decreasing experience was seen. Although this would support the notion that more experience leads to better interrater reliability, the applicability of this analysis is limited by the relatively small number of cases involved.
Light sedation remains operationally the most difficult grade of sedation to evaluate, as indicated by the results of this evaluation of the interrater reliability of the Wilson sedation scale as well as by published findings on the limitations of BIS with lighter grades of sedation. Current machine-based methodologies remain unable to maintain consistently clear distinctions between levels of lighter sedation, as would be needed, for example, when comparing outcomes in patients maintained at a Wilson Level 2 versus Level 4 during regional anesthesia. The modified Wilson scale seems to offer the best current means by which to monitor intraoperative sedation.
Tools for assessing variables of physiologic function, be they observer, patient, or machine based, must in all cases be developed, checked, and rechecked systematically. Machine-based measurements are meaningless unless the mechanical tool has been calibrated according to a known standard, to ensure both validity (accuracy) and consistency (reliability). The same is true for observer-based measurements: reliability, accuracy, and precision must continually be evaluated to maintain a consistent standard of measurement. When observer-based methods are developed for use in research and clinical settings, they must include operationalized definitions, assessor training and retraining, and periodic checks on inter- and intrarater reliability (45,46) to achieve the consistency necessary for scientific measurement (47). Researchers intending to use observer-based assessment scales should include quality control training and repeated calibration within their research protocol.
The modified Wilson sedation scale has good interrater reliability, and its clarification of each scoring category allows it to provide improved discrimination between levels of light sedation. It is quick and easy to use; by defining clear sedation end points, it can be used for determining sedation during regional anesthesia as well as a reference with which to correlate measures obtained from diverse monitoring devices. When accompanied by continuing checks on its reliability and construct validity, it will be a valuable tool for establishing and evaluating the effect of specific degrees of sedation, or differing sedative regimens, on patient outcomes after regional anesthesia.
1. Newton DEF, Thornton C, Konieczko KM, et al. Auditory evoked response and awareness: a study in volunteers at sub-MAC concentrations of isoflurane. Br J Anaesth 1992; 69: 122–9.
2. Thornton C, Barrowcliffe MP, Konieczko KM, et al. The auditory evoked response as an indicator of awareness. Br J Anaesth 1989; 63: 113–5.
3. Rampil IJ, Matteo RS. Changes in EEG spectral response edge frequency correlate with the hemodynamic response to laryngoscopy and intubation. Anesthesiology 1987; 67: 139–42.
4. Glass PS, Bloom M, Kearse L, et al. Bispectral analysis measures sedation and memory effects of propofol, midazolam, isoflurane, and alfentanil in healthy volunteers. Anesthesiology 1997; 86: 836–47.
5. Kearse LA, Rosow C, Zaslavsky A, et al. Bispectral analysis of the electroencephalogram predicts conscious processing of information during propofol sedation and hypnosis. Anesthesiology 1998; 88: 25–34.
6. Pollock JE, Neal JM, Liu SS, et al. Sedation during spinal anesthesia. Anesthesiology 2000; 93: 728–34.
7. Morley AP, Chung DC, Wong ASY, Short TG. The sedative and electroencephalographic effects of regional anaesthesia. Anaesthesia 2000; 55: 864–9.
8. Wang DY, Pomfrett CJD, Healy TEJ. Respiratory sinus arrhythmia: a new, objective sedation score. Br J Anaesth 1993; 71: 354–8.
9. Campagni MA, Howie MB, White PF, McSweeney TD. Comparative effects of oral clonidine and intravenous esmolol in attenuating the hemodynamic response to epinephrine injection. J Clin Anesth 1999; 11: 208–15.
10. Kulshrestha VK, Gupta PP, Turner P, Wadsworth J. Some clinical pharmacological studies with terfenadine, a new antihistamine drug. Br J Clin Pharmacol 1978; 6: 25–9.
11. Riker RR, Picard JT, Fraser GL. Prospective evaluation of the Sedation-Agitation Scale for adult critically ill patients. Crit Care Med 1999; 27: 1325–9.
12. Devlin JW, Boleski G, Mlynarek M, et al. Motor Activity Assessment Scale: a valid and reliable sedation scale for use with mechanically ventilated patients in an adult surgical intensive care unit. Crit Care Med 1999; 27: 1271–5.
13. Hogg LH, Bobek MB, Mion LC, et al. Interrater reliability of 2 sedation scales in a medical intensive care unit: a preliminary report. Am J Crit Care 2001; 10: 79–83.
14. Harris CE, O’Donnell C, Macmillan RR, et al. Use of propofol by infusion for sedation of patients undergoing haemofiltration: assessment of the effect of haemofiltration on the level of sedation and on blood propofol concentration. J Drug Dev 1991; 4: 37–9.
15. de Lemos J, Tweeddale M, Chittock D. Measuring quality of sedation in adult mechanically ventilated critically ill patients: the Vancouver Interaction and Calmness Scale. J Clin Epidemiol 2000; 53: 908–19.
16. Detriche O, Berré J, Massaut J, Vincent J-L. The Brussels sedation scale: use of a simple clinical sedation scale can avoid excessive sedation in patients undergoing mechanical ventilation in the intensive care unit. Br J Anaesth 1999; 83: 698–701.
17. Swart EL, van Schijndel RJ, van Loenen AC, Thijs LG. Continuous infusion of lorazepam versus midazolam in patients in the intensive care unit: sedation with lorazepam is easier to manage and is more cost-effective. Crit Care Med 1999; 27: 1461–5.
18. Avripas MB, Smythe MA, Carr A, et al. Development of an intensive care unit bedside sedation scale. Ann Pharmacother 2001; 35: 262–3.
19. Livingston BM, Mackenzie SJ, MacKirdy FN, Howie JC. Should the pre-sedation Glasgow Coma Scale value be used when calculating Acute Physiology and Chronic Health Evaluation scores for sedated patients? Crit Care Med 2000; 28: 389–94.
20. Ramsay MAE, Savege TM, Simpson BRJ, Goodwin R. Controlled sedation with alphaxalone-alphadolone. BMJ 1974; 2: 656–9.
21. Chernik DA, Gillings D, Laine H, et al. Validity and reliability of the Observer’s Assessment of Alertness/Sedation Scale: study with intravenous midazolam. J Clin Psychopharmacol 1990; 10: 244–51.
22. Schulte-Tamburen AM, Scheier J, Briegel J, et al. Comparison of five sedation scoring systems by means of auditory evoked potentials. Intensive Care Med 1999; 25: 377–82.
23. Hansen-Flaschen J, Cowen J, Polomano RC. Beyond the Ramsay scale: need for a validated measure of sedating drug efficacy in the intensive care unit. Crit Care Med 1994; 22: 732–3.
24. Liu J, Singh H, White PF. Electroencephalographic bispectral index correlates with intraoperative recall and depth of propofol-induced sedation. Anesth Analg 1997; 84: 185–9.
25. Wilson E, David A, Mackenzie N, Grant IS. Sedation during spinal anesthesia: comparison of propofol and midazolam. Br J Anaesth 1990; 64: 48–52.
26. Hammas B, Hvarfner A, Thorn S-E, Wattwil M. Propofol sedation and gastric emptying in volunteers. Acta Anaesthesiol Scand 1998; 42: 102–5.
27. Hvarfner A, Hammas B, Thorn S-E, Wattwil M. The influence of propofol on vomiting induced by apomorphine. Anesth Analg 1995; 80: 967–9.
28. Gan TJ, Ginsberg B, Grant AP, Glass PSA. Double-blind, randomized comparison of ondansetron and intraoperative propofol to prevent postoperative nausea and vomiting. Anesthesiology 1996; 85: 1036–42.
29. Hall JE, Uhrich TD, Barney JA, et al. Sedative, amnestic, and analgesic properties of small-dose dexmedetomidine infusions. Anesth Analg 2000; 90: 699–705.
30. Weingartner HJ, Joyce EM, Sirocco KY, et al. Specific memory and sedative effects of the benzodiazepine triazolam. J Psychopharmacol 1993; 7: 305–15.
31. Barzaghi N, Gatti G, Manni R, et al. Comparative pharmacokinetics and pharmacodynamics of eterobarbital and phenobarbital in normal volunteers. Eur J Drug Metab Pharmacokinet 1991; 16: 81–7.
32. Williamson BH, Nolan PJ, Tribe AE, Thompson PJ. A placebo controlled study of flumazenil in bronchoscopic patients. Br J Clin Pharmacol 1997; 43: 77–83.
33. Short TG, Young KK, Tan P, et al. Midazolam and flumazenil pharmacokinetics and pharmacodynamics following simultaneous administration to human volunteers. Acta Anaesthesiol Scand 1994; 38: 350–6.
34. Patat A, Klein MJ, Surjus A, et al. Study of effects of clobazam and lorazepam on memory and cognitive functions in healthy subjects. Hum Psychopharmacol 1991; 6: 229–41.
35. Levander S, Hagermark O, Stahle M. Peripheral antihistamine and central sedative effects of three H1-receptor antagonists. Eur J Clin Pharmacol 1985; 28: 523–9.
36. Blin O, Mestre D, Paut O, et al. GABA-ergic control of visual perception in healthy volunteers: effects of midazolam, a benzodiazepine, on spatio-temporal contrast sensitivity. Br J Clin Pharmacol 1993; 36: 117–24.
37. De Jonghe B, Cook D, Appere-de-Vecchi C, et al. Using and understanding sedation scoring systems: a systematic review. Intensive Care Med 2000; 26: 275–85.
38. Smith RB, Kroboth PD, Vanderlugt JT, et al. Pharmacokinetics and pharmacodynamics of alprazolam after oral and IV administration. Psychopharmacology (Berl) 1984; 84: 452–6.
39. Murdoch JA, Grant SA, Kenny GN. Safety of patient-maintained propofol sedation using a target-controlled system in healthy volunteers. Br J Anaesth 2000; 85: 299–301.
40. Fleiss JL. The measurement of interrater agreement. In: Statistical methods for rates and proportions. 2nd ed. New York: John Wiley & Sons, 1981: 212–36.
41. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977; 33: 159–74.
42. Kramer MS, Feinstein AR. Clinical biostatistics: LIV—the biostatistics of concordance. Clin Pharmacol Ther 1981; 29: 111–23.
43. Kirshner B, Guyatt G. A methodological framework for assessing health indices. J Chronic Dis 1985; 38: 27–36.
44. Feinstein AR. The evaluation of validity. In: Clinimetrics. New Haven: Yale University Press, 1987: 190–211.
45. McDowell I, Newell C. The theoretical and technical foundations of health measurement. In: Measuring health. 2nd ed. New York: Oxford University Press, 1996: 10–46.
46. Aday LA. Monitoring and carrying out the survey. In: Designing and conducting health surveys: a comprehensive guide. San Francisco: Jossey-Bass, 1989: 195–216.
© 2002 International Anesthesia Research Society
47. Feinstein AR. The evaluation of consistency. In: Clinimetrics. New Haven: Yale University Press, 1987: 167–89.