Robinson, Bryce R. H. MD, FACS1; Berube, Melanie RN, MSc2; Barr, Juliana MD, FCCM3,4; Riker, Richard MD, FCCM5; Gélinas, Céline RN, PhD6
Sedative medications are commonly administered to critically ill adult patients to reduce anxiety, manage agitation, and reduce asynchrony with mechanical ventilation (1). Historically, most critically ill patients requiring mechanical ventilation have been maintained in a state of deep sedation for extended periods until the life-threatening symptoms of their acute illness have resolved (2). However, maintaining ICU patients in excessive states of sedation is associated with an increased prevalence of negative outcomes, including posttraumatic stress disorder, prolongation of mechanical ventilation, ICU and hospital length of stay, and mortality (3–7). Conversely, inadequate sedation may lead to patient agitation that may contribute to myocardial ischemia, ventilator asynchrony, and self-extubation (8).
Defining and attaining the appropriate level of sedation for different types of ICU patients remains an ongoing challenge to clinicians. The development of ICU-specific sedation scales was a necessary step to allow sedation goals to be determined, increase the appropriate administration of sedation, and promote a common language for multidisciplinary critical care teams (9, 10). For a sedation scale to be useful, it should be simple to use and have robust psychometric properties, including a high degree of scale reliability, validity, feasibility, and applicability across ICU patient populations (11).
The reliability and validity of a sedation scale are not solely tied to the scale itself but rather include how a scale performs for a specific purpose in a particular group of patients in a given context (12). More specifically, reliability refers to the reproducibility of measurements obtained from an assessment scale by one or more individuals (13). Interrater reliability (IRR) and intrarater reliability are commonly presented as an evaluation strategy of sedation scale reliability. IRR refers to the agreement of scores obtained from two or more separate evaluators for the same patient at the same time, whereas intrarater reliability assesses the consistency of an individual to obtain stable score results for the same patients on two occasions if the concept to be measured is known to remain stable overtime (12, 13).
Validity refers to the conclusion that can be drawn from the scores of a scale (e.g., does the test measure what it claims to?) (12). Strategies for the determination of validity that can be used include content validation, construct validity (i.e., convergent and discriminant validation), and criterion validation (13). Content validation refers to evaluating the extent and relevance to which a scale represents all facets of a measurable concept, and in this case, the level of sedation and agitation. Construct validity focuses on the examination of the congruence between items of a scale and the extent to which these items represent the concept of interest. Convergent validation and discriminant validation are common strategies used to check construct validity. Convergent validation tests if the scale correlates with another instrument measuring the same concept using a different method (e.g., comparing sedation scale scores with cerebral measures such as bispectral index [BIS]). Comparing two subjective sedation scales with similar content represents a less optimal strategy since the correlation between the two scales would be expected to be high. Discriminant validation assesses if a sedation scale shows variation in scores with different conditions (such as loss or recovery of wakefulness during the initiation or titration of sedation). Finally, criterion validation establishes the relationship between the scale and another measure, ideally the “gold standard” identified in the field (13). Unfortunately, unlike for pain (in which the gold standard criterion is patient self-reporting), no such standard exists for sedation.
The goals of this article are: 1) to provide a more detailed description of the unique psychometric analyses of subjective sedation scales performed for the 2013 Society of Critical Care Medicine’s (SCCM) Clinical Practice Guidelines for the Management of Pain, Agitation and Delirium (PAD) in adult patients in the ICU as adapted from previous analyses focusing on pain scale evaluation (10); and 2) to update the psychometric evaluation of new literature of existing and novel subjective sedation scales used in adult ICU patients since publication of the guidelines.
MATERIALS AND METHODS
This review included two literature search strategies. The first used the original literature database created for the 2013 SCCM PAD guidelines (10). The PAD guideline database was created by the aid of a professional librarian (Charles P. Kishman, University of Cincinnati) using a comprehensive list of search terms determined by the various subcommittees of the guidelines. Sedation scale search terms included adult, critically ill, subjective sedation scale, sedation scale, validity, and reliability. Databases searched for references included PubMed, MEDLINE, Cochrane Database of Systematic Reviews, Cochrane Central Register of Controlled Trials, CINAHL, Scopus, ISI Web of Science, and the International Pharmaceutical Abstracts. Limitations included published (or in press) publications, English-only articles on adults (>18 years old) up to December 2010 and those studies with at least 30 patients. Of the 19,000+ references included in the PAD database, 94 included sedation scale keywords. Twenty-seven articles encompassing 10 different scales were deemed appropriate by two independent reviewers (B.R., C.G.) for evaluation of their psychometric properties (i.e., reliability and validity) using a standardized scoring system (Table 1). Scales evaluated included the Adaptation to the Intensive Care Environment (ATICE), Motor Activity Assessment Scale (MAAS), Minnesota Sedation Assessment Tool (MSAT), Observer’s Assessment of Alertness/Sedation Scale (OAA/S), Sedation-Agitation Scale (SAS), Sedation Intensive Care Score (SEDIC), New Sheffield Sedation Scale (Sheffield), Ramsay Sedation Scale (RSS), Richmond Agitation-Sedation Scale (RASS), and the Vancouver Interaction and Calmness Scale (VICS).
A second search was performed to identify additional sedation scale articles published between December 2010 and December 2012. A professional librarian (Rachel Daly, Jewish General Hospital, Montreal, QC, Canada) performed a search using the same terms as the initial search within PubMed, MEDLINE, CINAHL, Scopus, and the ISI Web of Science. Keywords were clustered into two concepts: concept 1 included terms relating to sedation scales and assessments (including sedation, assessment, sedation scales, ICU, and critically ill adult); concept 2 included the names of the 10 sedation scales evaluated in` the 2013 PAD guidelines. Intersection of these two concepts yielded 38 new articles. Two individual reviewers (B.R., C.G.) identified a total of nine articles as relevant, including one new scale, the Nursing Instrument for the Communication of Sedation (NICS) (14). Combining these two searches, a total of 36 articles that described the development and psychometric properties of 11 sedation scales were included in the current analysis.
Sedation Scale Psychometric Scoring System
The scale development process, psychometric properties, feasibility, and implementation of sedation scales were analyzed using psychometric criteria that were adapted from a scoring system initially developed for pain scales by Gélinas et al (15) and from previously published processes (Table 1) (16, 17). This psychometric scoring system was originally used for the purpose of identifying the most valid and reliable sedation scales for use in critically ill adults for the 2013 SCCM PAD guidelines (10). The psychometric scoring system developed for sedation scales underwent content validation by three international experts in the field of scale development and psychometric testing (Table 1). The initial version of the instrument was sent to these experts for their comments and feedback with only minor corrections being made. Details regarding this content validation process has been previously described (15).
The psychometric properties of each sedation scale were evaluated according to the following criteria: 1) item selection and content validation; 2) reliability; 3) validity; 4) feasibility; and 5) relevance or impact of implementation on patient outcomes (Table 1). Development of the initial psychometric tool for pain scales has been described in detail previously (15). As sedation scales differ from pain scales in their format and goal, the content of the psychometric evaluation tool was adapted. Characteristics of internal consistency and criterion validation were not applicable to sedation scale evaluation. As such, only IRR, intrarater reliability, convergent validation, and discriminant validation were included within the sedation scale psychometric scoring system.
Total raw scores for sedation scales ranged from 0 to 18. Weighted scores were established to attribute a fixed score to each category to address their relevance to the psychometric testing process and to facilitate the interpretation of results. In this case, more weight was given to reliability and validity, which represent the main sedation scale characteristics of interest. Weighted scores were obtained by calculating the adjustment of a score proportionally to another value. For instance, for the first category, a score of 4/5 represents a weighted score of 1.6/2. The total weighted score was obtained through the addition of weighted scores of each subscale and ranged from 0 to 20, which also corresponded to the pain scale scoring system used in the 2013 SCCM PAD guidelines, Gélinas et al (15) and to the one previously developed by Zwakhalen et al (10, 16). The interpretation of weighted scores was as follows: 15–20, very good psychometric properties; 12–14.9, moderate psychometric properties; 10–11.9, low psychometric properties that remain to be replicated in other studies; and 0–9.9, very few psychometric properties reported and/or unacceptable results. Those scales with moderate to very good psychometric properties (weighted score ≥ 12) were considered to be the most valid and reliable sedation scales for use in critically ill adult patients (Table 2).
The quality of evidence for each sedation scale was evaluated using categories similar to those used in the Grading of Recommendation Assessment, Development and Evaluation methodology, with modifications adapted for the psychometric analyses (Table 2) (18). All studies were reviewed, and all scales were scored independently by two reviewers not involved in the development or validation of the scales. Scores of individual items were attributed to each sedation scale according to the highest quality of evidence available. Final scores for each scale were based on a consensus between two reviewers.
Psychometric scores and quality of evidence related to studies for each of the 11 sedation scoring systems are included in Table 3. Two scales demonstrated weighted scores indicating very good psychometric properties: RASS (weighted score = 19.5) and the SAS (weighted score = 19). Scores with moderate psychometric properties include the VICS (14.3), ATICE (13.7), RSS (13.2), MSAT (13), and the newly developed NICS (12.8). Scales with low psychometric properties included the MAAS (11.5) and the SEDIC (10.5). The Sheffield (8.5) and the OAA/S (3.7) demonstrated very low quality of psychometric properties. New research published since 2010 resulted in an increase in the weighted psychometric scores for the MASS, RSS, ATICE, SAS, and RASS, and the addition of one new sedation scale (i.e., the NICS) since the initial psychometric analysis of sedation scales was performed for the 2013 PAD guidelines (10). A detailed discussion of the psychometric properties of each of these scales is as follows.
Richmond Agitation-Sedation Scale
Developed in 2002 by Sessler et al (19), the RASS uses 10 discrete levels ranging from −5 (unarousable) to +4 (combative). The original psychometric analysis of RASS included in the 2013 PAD guidelines included eight studies with over 1,600 ICU patients (19–26). Six additional RASS studies have been published since 2010, for a total of over 3,400 ICU patients studied using RASS (14, 27–31).
RASS scale development and item selection was clearly stated within the original article (19). The items were developed by a multidisciplinary team of critical care providers that included physicians, nurses, and pharmacists. The evaluation of the six RASS articles published after 2010 provided for the inclusion of content validation by Mirski et al (14) in which 0–10 Likert scale surveys were completed by ICU nurses to describe the relevance of agitation and sedation scoring (mean scores of 8.08 and 8.34 assigned, respectively).
IRR was described in the original articles as an interclass correlation coefficient (ICC) = 0.96, κ = 0.73 during phase 1 and an ICC = 0.96, κ = 0.80 during phase 2 (19). High weighted κ scores were observed by both Ely et al (26) (0.91) with a multidisciplinary team of four ICU clinicians and Truman et al (0.71–0.89) with 64 ICU nurses (23). More recent literature describes ICC ranges of 0.87–0.92 over a 40-minute period of sedation by four interdisciplinary raters in Mirski et al (14), r = 0.81 in two of three research nurses by Ashkenazy and DeKeyser-Ganz (27), and κ = 0.66 over 4 years among 627 bedside ICU nurses in 510 patients (28). High values of IRR were obtained by both research personnel and bedside ICU clinicians in the routine practice of care during these studies.
Convergent validation of the RASS versus BIS demonstrated correlation coefficients ranging from 0.64 to 0.82, with a single study (29) reporting an r2 score of 0.38, although RASS assessments in this study were only performed during a spontaneous breathing trial (21, 26, 31). Many groups performed discriminant validation of RASS. In the original study by Sessler et al (19), they demonstrated a significant change in RASS in intubated ICU patients when either sedatives or analgesic medications were administered, in those patients with an elevated Acute Physiology and Chronic Health Evaluation II score, and in patients more than 40 years old. Compared with an expert-level neuropsychiatric reference standard, RASS was able to differentiate among standardized levels of consciousness (26). Although correlations of RASS to sedative and analgesic drug equivalents have been previously made, the strengths of these associations are weak (25, 26).
RASS feasibility has been well documented with nursing surveys confirming ease of use in 77–82% and appropriateness in 89–92% of respondents (23, 26). These same studies also demonstrated that the majority of participatory nursing staff reported improved communication and the ability to provided target-specific sedation when the RASS score was used. An overall weighted score of 19.5 was assigned to RASS in the current analysis, representing a very good presentation of psychometric properties, and an increase from the previous weighted score of 19 for RASS published in the 2013 PAD guidelines (Table 3) (10).
The SAS was created by Riker et al (32) in 1994 with the goal of developing a sedation assessment tool that demonstrated high IRR between investigators as they evaluated the effects of continuous haloperidol infusions upon agitated, critically ill patients. The scale in its final form has seven individual tiers ranging from “1” (unarousable) to “7” (dangerous agitation) (33). Eight SAS studies were evaluated for the 2013 PAD guidelines with an additional four studies added for this review (14, 20–22, 27, 30–36). These 12 articles include over 1,500 patients evaluated with SAS.
The process of SAS scale development was well described within the original work and later expanded upon within Riker et al (33) in 1999 (32). After evaluation of the current and past literature for this review, scoring increased in comparison with the 2013 PAD guidelines due to clarity regarding scale development and evolution. Content validity evaluation was performed by three groups: initially in 1999 by Riker et al (33) with eight ICU nurses and a physician; later by Rassin et al (20) in 2007 by a panel of four specialized ICU nurses and a physician as the scale was translated into Hebrew; and finally in 2010 by Mirski et al (14) with a survey that was presented to ICU nurses (20).
SAS IRR testing demonstrated weighted κ scores consistently greater than 0.80. Riker et al (33) reported the highest score with a κw = 0.92, with other groups reporting κw from 0.82 to 0.90 (22, 35, 36). IRR was also high, with reported correlation coefficients of r = 0.81 between two of three research nurses and r = 0.83–0.86 between a panel of four ICU nurses and a physician (22, 27). The ICC was also high (ICC = 0.845) (14). Testing of IRR included not only research personal but also bedside ICU nurses and physicians trained in its use (20, 22, 33, 35, 36).
Compared with BIS as a measure of convergent validation, correlation coefficients ranged from r = 0.60 in earlier studies to r = 0.66 to 0.91 in later studies (21, 31, 34). Not surprisingly, in comparisons with other scales, SAS highly correlated with RASS in nearly 2,500 paired screens (r = 0.91) (30). SAS has also been compared with the NICS over a 40-minute period of sedation, with correlation coefficients ranging from r = 0.88–0.95 (14). However, these results were not considered in our psychometric analyses as sedation scales are similar in their content and correlations (thus producing inherent high convergent values). When SAS was applied during predetermined clinical interventions (i.e., ICU admission, first awakening, start of ventilator weaning, and after extubation), a significant difference in SAS scores was obtained (34). Differences in sedation levels over time could also be detected with SAS, whether these occurred over the entire hospitalization during specified periods or over a single 40-minute period of sedative use (14, 27).
Description of scale feasibility with SAS was unclear, although the directives of use are detailed in the original work (32). With the introduction of the updated literature database, the clinical impact of the scale was strengthened in that the use of SAS was comparable with RASS for determining ICU patient eligibility for delirium screening with the Confusion Assessment Method in the ICU (CAM-ICU) tool (37). An updated analysis of the psychometric properties of SAS has increased its previous weighted score in the 2013 PAD guidelines from 16.5 to 18.5, thereby strengthening its very good demonstration of psychometric properties (Table 3) (10).
Vancouver Interaction and Calmness Scale
The VICS was specifically developed in 2000 by de Lemos et al (38) as a bedside instrument for critical care providers to measure the quality of sedation so that ICU care goals could be identified and achieved. The total VICS score is the sum of two separate subscores derived from the “Interaction Score” and the “Calmness Score.” Each of these subscores has five individual components that are scored on a scale of 1–6, depending upon agreement with the stated component. The scale was initially tested with a total of 134 mechanically ventilated ICU patients. Weinert and McFarland (37) also tested the VICS in a second study with 94 intubated ICU patients, but only correlations were reported between subscales of the VICS and the MSAT and thus were not considered in this psychometric analysis. No additional VICS studies have been published since the 2013 PAD guidelines.
The development of the VICS and its limitations are well described within the original work (38). The authors created the scale after a literature review of relevant domains was identified by a focus group of critical care nurses, with content being evaluated by three physicians and 20 ICU nurses. IRR testing was performed by the original author group, although limited to 15 nurses, with an ICC = 0.89 for the Calmness Score and 0.90 for the Interaction Scale.
Convergent validation with other methods of evaluating sedation is still lacking. VICS was able to determine clinically important differences of care with the Calmness Score component discriminating the need for an intervention to make the patient more calm (r = −0.82) and the Interaction Score portion discriminating between acute and subacute agitation changes (38). Feasibility and implementation are not addressed within either of the VICS studies, although directions for its use are clearly described within the article by de Lemos et al (38). The total weighted psychometric score for VICS is 14.3, which represents a moderate evidence of psychometric properties (Table 3).
Adaptation to the Intensive Care Environment
The ATICE scale was specifically developed in 2003 by De Jonghe et al (39) to measure the level of adaptation of mechanically ventilated patients to their ICU environment. The ATICE scale was created to allow clinicians to distinguish among different ICU patients’ behaviors while receiving mechanical ventilation and to be able to quantify that change over time. The ATICE scale consists of two domains: 1) consciousness and 2) tolerance. It includes two questions pertaining to the consciousness domain (i.e., awakeness [0–5] and comprehension [0–5]) and three questions pertaining to the tolerance domain (i.e., calmness [0–3], ventilator synchrony [0–4], and facial relaxation [0–3]). The total score for the ATICE scale is a sum of the responses to these five questions and ranges from 0 (extremely poor adaptation) to 20 (very good adaptation).
Although developed by a multidisciplinary, multi-institutional group of intensive care providers, comprehensive validation of the ATICE scale was undertaken at a single institution and included only 80 mechanically ventilated ICU patients (39). The ATICE scale development process was sound, using a conceptual framework by bedside clinicians with directives easily described. IRR was high with ICC ranging from 0.92 to 0.99 between the two domains. Internal consistency was established by a Cronbach α value of 0.87 for the consciousness domain and 0.68 for the tolerance domain. The recent work of Yaman et al (31) allowed for convergent validation to occur, specifically a moderate correlation between the ATICE and the BIS (r = 0.57) was found. Discriminate validation was also described by correlating ATICE scores to the amount of sedation provided to patients over the last hour and the last 24 hours (39). The total weighted score for the ATICE scale in this analysis was 13.7, demonstrating moderate psychometric properties and an increase from a weighted score of 12.3 in the 2013 PAD guidelines analysis (Table 3) (10).
Ramsay Sedation Scale
Developed in 1974 by Ramsay et al (40) as the first subjective sedation scale, the original purpose of the RSS was to quantify the depth of sedation during the infusion of alphaxalone-alphadolone. Ten articles on the RSS were evaluated for the 2013 SCCM PAD guidelines with five additional studies included in this updated review (14, 19, 22, 27, 31, 33, 39–47). The RSS has been tested in a total of 586 patients across the 15 studies. Description of scale development was lacking in the original article. Updated validation of the RSS by Mirski et al (14) in 2010 included content validation analysis. This group surveyed ICU nurses (n = 53) with a 0–10 Likert scale to determine the relevance of various scales to describe agitation and sedation. The RSS had mean scores of 4.67 and 6.11 for agitation and sedation, respectively. With this inclusion of content validation, total weighted scoring increased in comparison with the 2013 PAD guidelines (10).
IRR of the RSS was inconsistent, with strong interrater coefficients described by Riker et al (33) (κw = 0.88), Nassar et al (22) (κw = 0.68–0.83), and van Dishoeck et al (κw = 0.9) (47). However, a κ = 0.28 was described by Olson et al (43) in a prospective randomized trial that presented six videos of sedated patients (one for each level of the RSS) to otherwise 237 blinded critical care nurses. More recent studies have described ICC values for a subset of research nurses performing the RSS over a 40-minute period of sedation of 0.80–0.89 (14). Because of the observed inconsistency in IRR, intrarater reliability was determined to be necessary to validate reliability; however, such data have not been published.
Convergent validation with the BIS was r = –0.62 in Hernández-Gancedo et al (42), κ = 0.759 in Yaman et al (31), and κ = 0.28 in Adesanya et al (46). Clinically, an increase in heart rate and respiratory rate as patients emerged from sedation significantly correlated with the RSS (r2 = 0.59, p < 0.001) (44). The RSS also demonstrated discriminant validation with the reevaluation of the original literature that contributed to an increase in the total weighted score. Specifically, Yaman et al (31) described a significant increase of RSS after the induction of sedation. Overall, scale feasibility was moderately rated by ICU nurses in Mirski et al (14) (mean value of 5.42/10), and its implementation and impact on ICU practice have not yet been described. In the ensuing 35 years since the RSS was originally published, the application of the RSS in a well-described, stepwise approach occurred, which enhanced its psychometric properties and increased its overall weighted score (47). The total weighted score of the RSS after the inclusion of new literature increased from 7.7 to 13.1, which increased its psychometric property evaluation from low in the 2013 PAD guidelines to moderate in the current evaluation (Table 3).
Minnesota Sedation Assessment Tool
The MSAT was developed by Weinert et al (37) in 2004 and is described in two publications by this same group (48). No new studies describing the psychometric properties of the MSAT have been published since 2010. The MSAT includes two subscales, the Motor Activity Scale and the Arousal Scale. The Motor Activity Subscale is scored from 1 (i.e., no spontaneous movement) to 4 (i.e., movement of a central muscle group, such as the back or abdomen). The Arousal Subscale is scored from 1 (i.e., eyes stay closed and no patient movement in response to stimuli) to 6 (i.e., eyes open spontaneously with tracking). MSAT scale development and weaknesses are well described within the original work. Over 40 ICU physicians and nurses evaluated the content of the MSAT scale. A preliminary scale was modified in response to the input of this provider group and results from pretesting information. A total of 368 mechanically ventilated ICU patients underwent scoring with the MSAT.
IRR with both clinicians and the research team is described (37). The Motor Activity Subscale demonstrated a κ = 0.72 and an ICC of 0.81, whereas the Arousal Subscale demonstrated a κ = 0.85 and an ICC of 0.96 (37). Convergent validation information for the MSAT is lacking, although discriminant validation is demonstrated by its change in scoring as sedative agents were administered (37, 48). Feasibility and scale relevance information are not presented. The MSAT received a total weighted score of 13, representing a moderate presentation of psychometric properties (Table 3).
Nursing Instrument for the Communication of Sedation
The NICS scale was developed in 2010 and was not part of the initial analysis of the PAD guidelines (14). The objective behind the development of the NICS was to create a scale that would optimize sedation titration by facilitating communication between nurses and other ICU health professionals, using a simpler and more intuitive scoring system than previously published sedation scales.
The NICS is an ordinal seven-level symmetrical scale centered about an optimal cooperative state (0), ranging from dangerously agitated (+3) to deeply sedated (−3). Aside from the reference to the intuitive rhetorical metric of “threes” (i.e., good, better, best), no other detail was provided with regard to the selection of scale items for the NICS scale (49, 50). Content validation was examined by 53 ICU nurses who were asked to evaluate relevance and the logical/intuitive nature of the scale compared with RASS, SAS, RSS, and MAAS using a 0 to 10-point Likert scale. Statistically significant higher scores were obtained for the NICS evaluation (mean > 8 of 10) when compared with the RASS, which was the second highest rated scale across all measures.
NICS reliability and construct validity were assessed among adult neurological, medical, and surgical ICU patients (n = 104), including both intubated (n = 20) and nonintubated (n = 84) patients, of which 37% were receiving sedative agents. A research team of four healthcare providers, as well as a senior neurointensivist, determined IRR at three 20-minute intervals. An overall ICC of 0.87 was obtained for the three time periods, ranging from 0.86 to 0.91 for each period.
Convergent validation was examined by comparing NICS scores with evaluations from the senior neurointensivist using an eight-point level of arousal measurement, and a high correlation was obtained (r = 0.96).
NICS feasibility was determined through surveying the same nurses about the ease of scoring for states of agitation and sedation, as well as the ability to facilitate communication. The ease of scoring was combined with content relevance for which mean values greater than 8 of 10 were obtained. The impact of NICS implementation in the ICU was not examined. The total weighted score for the NICS was 12.8, which indicates a moderate quality of psychometric properties (Table 3).
Motor Activity Assessment Scale
Initially developed from the SAS by Clemmer et al (51) and tested for reliability and validity in 1999 by Devlin et al (52), the MAAS includes six levels of sedation, with a score of 0 describing an unresponsive patient and 6 describing a dangerously agitated and uncooperative patient. Descriptive information focusing on scale development was initially lacking, although the inclusion of content validation data via surveys from ICU nursing personal from Mirski et al (14) increased scoring of the scale development section (52, 53).
IRR ranged from κ = 0.83 for 32 pretrained ICU nurses in the original article to ICCs from 0.81 to 0.86 in later works (14, 52, 53). A total of 160 ICU patients were included in the three studies although a single study examined convergent validation by relating the MAAS scores to significant (p < 0.001) changes in blood pressure, heart rate, and the presence of agitation-related sequelae (52). However, the strengths of the correlation coefficients were not reported. Directives for its use are clearly described by Hogg et al (53), although feasibility and/or implementation testing are lacking. The addition of content validation information increased the weighted score from 11 in the 2013 PAD guidelines to 11.5. This value represents an overall low evidence of psychometric properties (Table 3).
Sedation Intensive Care Score
The SEDIC scale was developed in 2006 by Binnekade et al (41) to assess depth of sedation in ICU patients. The SEDIC scale is calculated from the combined subscores resulting from a graded approach to patient stimuli (1–5) and the corresponding response (1–5) (41). The SEDIC scale was developed and tested at a single institution on 46 mechanically ventilated ICU patients. Items included from within the scale are justified in the original article although information regarding how the content was evaluated is missing. Limitations of the scale, description of employment, feasibility, and relevance are lacking.
Reliability testing of the SEDIC scale occurred with ICU nurses with an ICC = 0.88. Convergent validation was not performed, but discriminant validation was confirmed associating SEDIC scores with the duration of sedation emergence. No additional studies of the psychometric properties of the SEDIC scale have been published since 2006. The total weighted score for the SEDIC scale was 10.5, due to low quality evidence of psychometric properties presented (Table 3).
New Sheffield Sedation Scale
Originally described by Laing (54) in 1992, the Sheffield was modified and underwent simple reliability testing by Olleveant et al (55) in 1998. The Sheffield Scale describes six levels of sedation, with a score of 1 representing awake and 6 delineating flat. Olleveant et al (55) also incorporated the notation of sleeping (S) or paralyzed (P) as well.
Psychometric evaluation was limited to one study conducted in a single institution with 100 paired observations by trained ICU nurses (55). High marks were given for IRR (κ = 0.73), although limited information was presented for scale development, feasibility, and relevance. Validity testing has not been published. No additional studies describing the psychometric properties of the Sheffield Scale have been published since the 2013 PAD guidelines analysis was performed. The Sheffield Scale received a total weighted score of 8.5 due to a very low demonstration of psychometric properties overall (Table 3).
Observer’s Assessment of Alertness/Sedation Scale
The OAA/S was originally developed in 1990 to measure alertness in subjects receiving midazolam for light to deep sedation, followed by flumazenil for reversal (56). The OAA/S scores four categories: responsiveness, facial expression, eyes, and speech. Each category is scored from 5 (alert) to 1 (deep sleep). The OAA/S was then applied to 50 deeply sedated, critically ill patients although the speech component of the scale was removed due to the uniform use of mechanical ventilation (42).
No scale development, reliability testing, feasibility, or implementation data were present in either of the OAA/S studies published, although scale directives are included in original work. Convergent validation was performed with BIS demonstrating a correlation of r = 0.59 (42). No additional OAA/S references were identified in the new literature search. The OAA/S total weighted score remains at 3.7, corresponding to a very low evidence of published psychometric properties (Table 3).
Sedative and analgesic medications are commonly administered to critically ill patients in order to allay anxiety, manage pain, and to prevent and treat agitation. The use of subjective sedation scales enables ICU clinicians to better titrate these medications in order to prevent over-sedation, which can result in a prolonged duration of mechanical ventilation and ICU length of stay and an increased prevalence of delirium and death in these patients (57–63). The 2013 PAD guidelines recommended the following: 1) pain, agitation and depth of sedation, and delirium should be frequently monitored in ICU patients, using valid and reliable assessment tools; 2) ICU patients should receive adequate and preemptive treatment for pain; 3) ICU patients should receive sedative medications only if required; and 4) sedatives should be titrated to maintain a light level of sedation (10). Specifically, light sedation is defined as a level of sedation that allows for ICU patients to be responsive and aware, as demonstrated by their ability to purposefully follow commands (i.e., successfully performs a combination of any three of the following actions upon request: opens eyes, maintains eye contact, squeezes hand, sticks out tongue, and/or wiggles toes) (2, 64, 65). This degree of responsiveness and awareness goes well beyond ICU patients being merely “sleepy but arousable” and is essential for the evaluation of pain in ICU patients through patient self-report, for assessing mechanically ventilated patients’ readiness to wean and extubate, performing delirium assessments, and implementing early mobility efforts in these patients (2, 10, 64, 65). Implementing valid and reliable bedside sedation scoring tools in the ICU allows clinicians to accurately assess and communicate patients’ depth of sedation to other members of the ICU care team and to incorporate these assessments into sedation protocols designed to optimize sedation in these patients. The benefits of using ICU sedation protocols that use sedation scales include reduced sedative usage and reduced durations of mechanical ventilation and ICU length of stays (48, 66–71).
Since the introduction of the RSS nearly 40 years ago, there has been a significant increase in the number and type of sedation scales developed for assessing depth of sedation in critically ill patients. But the psychometric properties of these scales, which determine their validity and reliability for use in ICU patients, vary widely. Prior to the recent publication of the 2013 PAD guidelines, there was no comprehensive objective assessment comparing the strengths and weaknesses of the psychometric properties of existing sedation scales. The psychometric analysis of sedation scales conducted as part of the 2013 PAD guidelines demonstrated that out of 10 sedation scales for use in ICU patients, the RASS and SAS were considered to be the most valid and reliable (10).
The inclusion of literature beyond what was reported in the 2013 SCCM PAD guidelines allowed for the introduction of a new scale, the NICS, and strengthened the psychometric properties of five scales (MAAS, RSS, ATICE, SAS, RASS). More specifically, the work of Mirski et al (14) demonstrated additional content validation of the RASS, SAS, RSS, and MAAS using a 0–10 Likert scale survey provided to bedside nurses. Yaman et al (31) compared BIS with intermittent sedation assessments using RSS, RASS, SAS, and ATICE scales over a 24-hour period of sedation, adding to the convergent and discriminant validation of these scales. Finally, the work of Khan et al (30) increased the relevance of SAS by validating its use as a screening instrument to determine eligibility for delirium assessment via the CAM-ICU. This allows for multiple sedation scales (SAS and RASS) to be used in concert with a validated delirium assessment tool for the care of critically ill adults. In the final analysis, seven of the 11 sedation scales we evaluated (i.e., RASS, SAS, VICS, ATICE, RSS, MSAT, and the NICS) demonstrated sufficient psychometric properties (i.e., weighted score ≥ 12) for possible use in critically ill adults. But of these, the RASS and SAS scales remain the most valid and reliable sedation assessment tools for use in adult ICU patients.
There are limitations in the development and validation processes common to all included scales. Inherent to all subjective sedation scales is the premise that not all scales are reliable and/or valid at all times and in every ICU patient during every situation. A ceiling effect in the presentation of sedation scales is inherent in that it is unsafe to have patients at the extremes of sedation merely to have testing performed. As such, reliability and validity testing at these extremes is typically limited. Also, the use of observations within the same patient for a relative narrow spectrum of agitation for the calculation of IRR variables tends to raise the intercorrelations between evaluations, thereby falsely elevating coefficient values (13).
The majority of sedation scales included in this analysis underwent testing primarily in a medical-surgical ICU population, specifically omitting polytrauma patients, traumatic brain injury patients, and/or those with a primary neurological diagnosis at the time of their ICU admission. Only five of 36 articles that we evaluated for this analysis included at least one of these critical care subpopulations (14, 19, 21, 29, 36). ICU patients receiving neuromuscular blocking agents are excluded from all of these studies, since observable behaviors are in an inherent property of all of these sedation scales. These patients require alternative brain monitoring techniques that objectively measure brain activity (10).
Unlike pain scales in which patients can accurately self-assess pain and compare their result with an observer’s subjective assessment, subjective sedation scales lack this important gold standard for convergent validation. Alternative means to measure the same clinical events although by a different technique are often employed. These often include the comparison of sedation scale scores with BIS values, auditory evoked potentials, Narcotrend Index, Patient State Index, or state entropy. However, the use of these objective techniques is not recommended as a primary method to monitor the depth of sedation in the noncomatose, nonparalyzed critically ill patient (10). Thus, the use of such comparisons between subjective and objective scales is controversial.
A potential bias exists in this analysis by our exclusion of those works that had a study cohort of less than 30 patients, those published in languages other than English, or those that may have been overlooked by the described search strategies. The reliability and validity related to the use of subjective sedation scales are not static and will evolve with the further development, testing, and publication of these properties in a variety of critical care populations. Additional potential bias exists in the specific selection and weighting of psychometric properties included in the assessment tool we developed for the 2013 PAD guidelines. Variations of this scoring system have been previously used in two separated publications not only for the evaluation of sedation scales but also for pain and delirium scales (10, 15). Nonetheless, because a single, standardized psychometric scoring system does not exist for the evaluation of subjective sedation scales, a different combination of elements may lead to different scores assigned to each scale.
The use of sedation for the control of anxiety and agitation is a fundamental principle of the care of critically ill patient. Subjective sedation scales are commonly employed by multidisciplinary ICU care teams for the promotion of effective communication and defining sedation goals that may lead to a reduction in sedative use, and shorten the duration of mechanical ventilation and ICU length of stays. Multiple sedation scales currently exist with varying degrees of quality based on reported psychometric properties. Based on the current literature and using a predetermined psychometric assessment system, the most reliable and valid subjective sedation scales include the RASS, SAS, VICS, ATICE, RSS, MSAT, and the NICS. Of these, the RASS and SAS remain the most valid and reliable sedation scales for use in adult ICU patients.
We thank and acknowledge Charles P. Kishman, Jr, MSLS, Information Services Librarian (University of Cincinnati, Cincinnati, OH), and Rachel Daly, Research Librarian (Jewish General Hospital, Montreal, QC, Canada) for their help in searching, creating, and updating the literature for this work; and psychometric experts David Streiner, PhD (University of Toronto, Department of Psychiatry, Toronto, ON, Canada; McMaster University, Department of Clinical Epidemiology and Biostatistics, Hamilton, ON, Canada), Celeste Johnston, RN, DEd (School of Nursing, McGill University, Montreal, QC, Canada), and Carolyn Waltz, RN, PhD, FAAN (School of Nursing, University of Maryland, Baltimore, MD) for their help in developing the psychometric assessment tool used here.
1. Wunsch H, Kahn JM, Kramer AA, et al. Use of intravenous infusion sedation among mechanically ventilated patients in the United States. Crit Care Med. 2009;37:3031–3039
2. Kress JP, Pohlman AS, O’Connor MF, et al. Daily interruption of sedative infusions in critically ill patients undergoing mechanical ventilation. N Engl J Med. 2000;342:1471–1477
3. Cook DJ, Walter SD, Cook RJ, et al. Incidence of and risk factors for ventilator-associated pneumonia in critically ill patients. Ann Intern Med. 1998;129:433–440
4. De Jonghe B, Cook D, Sharshar T, et al. Acquired neuromuscular disorders in critically ill patients: A systematic review. Groupe de Reflexion et d’Etude sur les Neuromyopathies En Reanimation. Intensive Care Med. 1998;24:1242–1250
5. Kollef MH, Levy NT, Ahrens TS, et al. The use of continuous i.v. sedation is associated with prolongation of mechanical ventilation. Chest. 1998;114:541–548
6. Nelson BJ, Weinert CR, Bury CL, et al. Intensive care unit drug use and subsequent quality of life in acute lung injury patients. Crit Care Med. 2000;28:3626–3630
7. Shehabi Y, Bellomo R, Reade MC, et al.Sedation Practice in Intensive Care Evaluation (SPICE) Study Investigators; ANZICS Clinical Trials Group. Early intensive care sedation predicts long-term mortality in ventilated critically ill patients. Am J Respir Crit Care Med. 2012;186:724–731
8. De Jonghe B, Cook D, Appere-De-Vecchi C, et al. Using and understanding sedation scoring systems: A systematic review. Intensive Care Med. 2000;26:275–285
9. Jacobi J, Fraser GL, Coursin DB, et al.Task Force of the American College of Critical Care Medicine (ACCM) of the Society of Critical Care Medicine (SCCM), American Society of Health-System Pharmacists (ASHP), American College of Chest Physicians. Clinical practice guidelines for the sustained use of sedatives and analgesics in the critically ill adult. Crit Care Med. 2002;30:119–141
10. Barr J, Fraser GL, Puntillo K, et al.American College of Critical Care Medicine. Clinical practice guidelines for the management of pain, agitation, and delirium in adult patients in the intensive care unit. Crit Care Med. 2013;41:263–306
11. Hansen-Flaschen J, Cowen J, Polomano RC. Beyond the Ramsay scale: Need for a validated measure of sedating drug efficacy in the intensive care unit. Crit Care Med. 1994;22:732–733
12. American Educational Research Association, American Psychological Association, National Council on Measurement in Education, Joint Committee on Standards for Educational and Psychological Testing (U.S.). Standards for Educational and Psychological Testing. 1999 Washington, DC, American Educational Research Association
13. Streiner DL, Norman GR Joint Author: Health Measurement Scales a Practical Guide to their Development and Use. 2008 Fourth Edition Oxford Oxford University Press
14. Mirski MA, LeDroux SN, Lewin JJ III, et al. Validity and reliability of an intuitive conscious sedation scoring tool: The nursing instrument for the communication of sedation. Crit Care Med. 2010;38:1674–1684
15. Gélinas C, Puntillo KA, Joffe A, Barr J. A validated approach to evaluating psychometric properties of pain assessment tools in nonverbal critically ill adults. Semin RespirCrit Care Med. 2013;34:153–168
16. Zwakhalen SM, Hamers JP, Abu-Saad HH, et al. Pain in elderly people with severe dementia: A systematic review of behavioural pain assessment tools. BMC Geriatr. 2006;6:3
17. Pudas-Tähkä SM, Axelin A, Aantaa R, et al. Pain assessment tools for unconscious or sedated intensive care patients: A systematic review. J Adv Nurs. 2009;65:946–956
18. Guyatt GH, Oxman AD, Vist GE, et al.GRADE Working Group. GRADE: An emerging consensus on rating quality of evidence and strength of recommendations. BMJ. 2008;336:924–926
19. Sessler CN, Gosnell MS, Grap MJ, et al. The Richmond Agitation-Sedation Scale: Validity and reliability in adult intensive care unit patients. Am J Respir Crit Care Med. 2002;166:1338–1344
20. Rassin M, Sruyah R, Kahalon A, et al. “Between the fixed and the changing”: Examining and comparing reliability and validity of 3 sedation-agitation measuring scales. Dimens Crit Care Nurs. 2007;26:76–82
21. Deogaonkar A, Gupta R, DeGeorgia M, et al. Bispectral Index monitoring correlates with sedation scales in brain-injured patients. Crit Care Med. 2004;32:2403–2406
22. Nassar AP Jr, Neto RCP, de Figueiredo WB, Park M. Validity, reliability and applicability of Portuguese versions of sedation-agitation scales among critically ill patients. Sao Paulo Med J. 2008;126:215–219
23. Pun BT, Gordon SM, Peterson JF, et al. Large-scale implementation of sedation and delirium monitoring in the intensive care unit: A report from two medical centers. Crit Care Med. 2005;33:1199–1205
24. Chanques G, Jaber S, Barbotte E, et al. Impact of systematic evaluation of pain and agitation in an intensive care unit. Crit Care Med. 2006;34:1691–1699
25. Masica AL, Girard TD, Wilkinson GR, et al. Clinical sedation scores as indicators of sedative and analgesic drug exposure in intensive care unit patients. Am J Geriatr Pharmacother. 2007;5:218–231
26. Ely EW, Truman B, Shintani A, et al. Monitoring sedation status over time in ICU patients: Reliability and validity of the Richmond Agitation-Sedation Scale (RASS). JAMA. 2003;289:2983–2991
27. Ashkenazy S, DeKeyser-Ganz F. Assessment of the reliability and validity of the Comfort Scale for adult intensive care patients. Heart Lung. 2011;40:e44–e51
28. Vasilevskis EE, Morandi A, Boehm L, et al. Delirium and sedation recognition using validated instruments: Reliability of bedside intensive care unit nursing assessments from 2007 to 2010. J Am Geriatr Soc. 2011;59(Suppl 2):S249–S525
29. Ogilvie MP, Pereira BM, Ryan ML, et al. Bispectral index to monitor propofol sedation in trauma patients. J Trauma. 2011;71:1415–1421
30. Khan BA, Guzman O, Campbell NL, et al. Comparison and agreement between the Richmond Agitation-Sedation Scale and the Riker Sedation-Agitation Scale in evaluating patients’ eligibility for delirium assessment in the ICU. Chest. 2012;142:48–54
31. Yaman F, Ozcan N, Ozcan A, et al. Assessment of correlation between bispectral index and four common sedation scales used in mechanically ventilated patients in ICU. Eur Rev Med Pharmacol Sci. 2012;16:660–666
32. Riker RR, Fraser GL, Cox PM. Continuous infusion of haloperidol controls agitation in critically ill patients. Crit Care Med. 1994;22:433–440
33. Riker RR, Picard JT, Fraser GL. Prospective evaluation of the Sedation-Agitation Scale for adult critically ill patients. Crit Care Med. 1999;27:1325–1329
34. Riker RR, Fraser GL, Simmons LE, et al. Validating the Sedation-Agitation Scale with the Bispectral Index and Visual Analog Scale in adult ICU patients after cardiac surgery. Intensive Care Med. 2001;27:853–858
35. Brandl KM, Langley KA, Riker RR, et al. Confirming the reliability of the sedation-agitation scale administered by ICU nurses without experience in its use. Pharmacotherapy. 2001;21:431–436
36. Ryder-Lewis MC, Nelson KM. Reliability of the Sedation-Agitation Scale between nurses and doctors. Intensive Crit Care Nurs. 2008;24:211–217
37. Weinert C, McFarland L. The state of intubated ICU patients: Development of a two-dimensional sedation rating scale for critically ill adults. Chest. 2004;126:1883–1890
38. de Lemos J, Tweeddale M, Chittock D. Measuring quality of sedation in adult mechanically ventilated critically ill patients. The Vancouver Interaction and Calmness Scale. Sedation Focus Group. J Clin Epidemiol. 2000;53:908–919
39. De Jonghe B, Cook D, Griffith L, et al. Adaptation to the Intensive Care Environment (ATICE): Development and validation of a new sedation assessment instrument. Crit Care Med. 2003;31:2344–2354
40. Ramsay MA, Savege TM, Simpson BR, et al. Controlled sedation with alphaxalone-alphadolone. Br Med J. 1974;2:656–659
41. Binnekade JM, Vroom MB, de Vos R, et al. The reliability and validity of a new and simple method to measure sedation levels in intensive care patients: A pilot study. Heart Lung. 2006;35:137–143
42. Hernández-Gancedo C, Pestaña D, Peña N, et al. Monitoring sedation in critically ill patients: Bispectral index, Ramsay and observer scales. Eur J Anaesthesiol. 2006;23:649–653
43. Olson D, Lynn M, Thoyre SM, et al. The limited reliability of the Ramsay scale. Neurocrit Care. 2007;7:227–231
44. Haberthür C, Lehmann F, Ritz R. Assessment of depth of midazolam sedation using objective parameters. Intensive Care Med. 1996;22:1385–1390
45. Schulte-Tamburen AM, Scheier J, Briegel J, et al. Comparison of five sedation scoring systems by means of auditory evoked potentials. Intensive Care Med. 1999;25:377–382
46. Adesanya AO, Rosero E, Wyrick C, et al. Assessing the predictive value of the bispectral index vs patient state index on clinical assessment of sedation in postoperative cardiac surgery patients. J Crit Care. 2009;24:322–328
47. van Dishoeck AM, van der Hooft T, Simoons ML, et al. Reliable assessment of sedation level in routine clinical practice by adding an instruction to the Ramsay Scale. Eur J Cardiovasc Nurs. 2009;8:125–128
48. Weinert CR, Calvin AD. Epidemiology of sedation and sedation adequacy for mechanically ventilated patients in a medical and surgical intensive care unit. Crit Care Med. 2007;35:393–401
49. Merriam AH. Words and numbers: Mathematical dimensions of rhetoric. Southern Commun J. 1990;55:337–354
50. Pandharipande PP, Pun BT, Herr DL, et al. Effect of sedation with dexmedetomidine vs lorazepam on acute brain dysfunction in mechanically ventilated patients: The MENDS randomized controlled trial. JAMA. 2007;298:2644–2653
51. Clemmer TP, Wallace JC, Spuhler VJ, et al. Origins of the Motor Activity Assessment Scale score: A multi-institutional process. Crit Care Med. 2000;28:3124
52. Devlin JW, Boleski G, Mlynarek M, et al. Motor Activity Assessment Scale: A valid and reliable sedation scale for use with mechanically ventilated patients in an adult surgical intensive care unit. Crit Care Med. 1999;27:1271–1275
53. Hogg LH, Bobek MB, Mion LC, et al. Interrater reliability of 2 sedation scales in a medical intensive care unit: A preliminary report. Am J Crit Care. 2001;10:79–83
54. Laing AS. The applicability of a new sedation scale for intensive care. Intensive Crit Care Nurs. 1992;8:149–152
55. Olleveant N, Humphris G, Roe B. A reliability study of the modified new Sheffield Sedation Scale. Nurs Crit Care. 1998;3:83–88
56. Chernik DA, Gillings D, Laine H, et al. Validity and reliability of the Observer’s Assessment of Alertness/Sedation Scale: Study with intravenous midazolam. J Clin Psychopharmacol. 1990;10:244–251
57. Granja C, Gomes E, Amaro A, et al.JMIP Study Group. Understanding posttraumatic stress disorder-related symptoms after critical care: The early illness amnesia hypothesis. Crit Care Med. 2008;36:2801–2809
58. Samuelson K, Lundberg D, Fridlund B. Memory in relation to depth of sedation in adult mechanically ventilated intensive care patients. Intensive Care Med. 2006;32:660–667
59. Samuelson KA, Lundberg D, Fridlund B. Light vs. heavy sedation during mechanical ventilation after oesophagectomy—A pilot experimental study focusing on memory. Acta Anaesthesiol Scand. 2008;52:1116–1123
60. Ouimet S, Kavanagh BP, Gottfried SB, et al. Incidence, risk factors and consequences of ICU delirium. Intensive Care Med. 2007;33:66–73
61. Weinert CR, Sprenkle M. Post-ICU consequences of patient wakefulness and sedative exposure during mechanical ventilation. Intensive Care Med. 2008;34:82–90
62. Roberts BL, Rickard CM, Rajbhandari D, et al. Factual memories of ICU: Recall at two years post-discharge and comparison with delirium status during ICU admission—A multicentre cohort study. J Clin Nurs. 2007;16:1669–1677
63. Nelson BJ, Weinert CR, Bury CL, et al. Intensive care unit drug use and subsequent quality of life in acute lung injury patients. Crit Care Med. 2000;28:3626–3630
64. Mehta S, Burry L, Martinez-Motta JC, et al.Canadian Critical Care Trials Group. A randomized trial of daily awakening in critically ill patients managed with a sedation protocol: A pilot trial. Crit Care Med. 2008;36:2092–2099
65. Schweickert WD, Pohlman MC, Pohlman AS, et al. Early physical and occupational therapy in mechanically ventilated, critically ill patients: A randomised controlled trial. Lancet. 2009;373:1874–1882
66. Payen JF, Bosson JL, Chanques G, et al.DOLOREA Investigators. Pain assessment is associated with decreased duration of mechanical ventilation in the intensive care unit: A post hoc analysis of the DOLOREA study. Anesthesiology. 2009;111:1308–1316
67. Robinson BR, Mueller EW, Henson K, et al. An analgesia-delirium-sedation protocol for critically ill trauma patients reduces ventilator days and hospital length of stay. J Trauma. 2008;65:517–526
68. Girard TD, Kress JP, Fuchs BD, et al. Efficacy and safety of a paired sedation and ventilator weaning protocol for mechanically ventilated patients in intensive care (Awakening and Breathing Controlled trial): A randomised controlled trial. Lancet. 2008;371:126–134
69. Payen JF, Chanques G, Mantz J, et al. Current practices in sedation and analgesia for mechanically ventilated critically ill patients: A prospective multicenter patient-based study. Anesthesiology. 2007;106:687–695; quiz 891
70. Shehabi Y, Botha JA, Boyle MS, et al. Sedation and delirium in the intensive care unit: An Australian and New Zealand perspective. Anaesth Intensive Care. 2008;36:570–578
71. Martin J, Franck M, Sigel S, Weiss M, Spies CD. Changes in sedation management in German intensive care units between 2002 and 2006: A national follow up survey. Crit Care. 2007;11:R124
critical care medicine; intensive care; reliability; sedation; sedation score
© 2013 by the Society of Critical Care Medicine and Lippincott Williams & Wilkins