Share this article on:

Development and Validation of the Dartmouth Operative Conditions Scale

Cravero, Joseph P. MD; Blike, George T. MD; Surgenor, Stephen D. MD; Jensen, Jens MS

doi: 10.1213/01.ANE.0000150605.43251.84
Pediatric Anesthesia: Research Report

Studies of pediatric sedation practice have suffered from the lack of an objective scale that would allow for a comparison of the effectiveness and safety of sedation provided by various providers and techniques. We present the Dartmouth Operative Conditions Scale (DOCS), which is designed as a research tool to codify the appropriateness of the procedural conditions provided by various sedation interventions. To begin, human factors methodology was used to develop a model of the pediatric sedation process and to define the criteria for measuring a patient’s condition during a procedure (DOCS). To accomplish validation, 70 video clips (30-s duration) were then selected from more than 300 h of procedural video tape for testing/grading purposes. Inter-rater reliability was tested by comparing the score for each video clip among 10 different raters. Intra-rater reliability was evaluated by retesting all of the raters 1 yr after their initial rating. Construct validity was confirmed by analyzing the change in DOCS score relative to the time that sedation intervention was undertaken. Criterion validity was tested by comparing the DOCS to a modified COMFORT® score. The DOCS was completed with excellent inter-rater (kappa = 0.84) and intra-rater (kappa = 0.91) agreement by 10 health care providers with various backgrounds during the 1-yr study period. Criterion validity was supported by the close correlation between the DOCS and the modified COMFORT® scores for 20 distinct video clips (Spearman correlation coefficient = 0.98; P < 0.001). The distribution of DOCS scores 20 min after the anesthetic induction was significantly lower than the scores before initiation of sedation, and scores after emergence were consistently higher than those 20 min after sedation (P < 0.001), thus confirming construct validity of the scale. The DOCS is a validated research tool when used with video data for comparing the effectiveness and safety of pediatric sedation service, regardless of technique used for decreasing anxiety or pain during a procedure.

IMPLICATIONS: This study outlines the development and validation of the Dartmouth Operative Conditions Scale. This tool is intended to be used with video tape analysis to better understand the effectiveness, efficiency, and safety of pediatric sedation systems.

Department of Anesthesiology, Dartmouth Hitchcock Medical Center, Lebanon, New Hampshire

Accepted for publication October 29, 2004.

Address correspondence and reprint requests to Joseph P. Cravero, Department of Anesthesiology, Dartmouth Hitchcock Medical Center, One Medical Center Dr., Lebanon, NH 03756. Address e-mail to

Pediatric sedation research and quality assessment has suffered from a lack of an objective scale that would allow comparison of the effectiveness and safety of sedation provided by various providers using various techniques. Previous scales have been reported that specifically measure the level of sedation of children during procedures (1–3). Pediatric pain scales and scales that measure the comfort and stress of pediatric patients in the intensive care environment have also been reported (4,5). None of these scales accounts for all the specific features that are relevant to providing optimum conditions for procedures, such as control of movement during a procedure, stress, pain, or respiratory side effects from sedatives in a time frame that is appropriate for procedural sedation.

The Dartmouth Operative Conditions Scale (DOCS) was designed specifically to codify the appropriateness of the conditions during medical procedures for children facilitated by various sedation/analgesia techniques (i.e., simple distraction, sedation/anesthesia, or pure analgesia). The DOCS compares conditions (or patient state) present during a procedure in terms of pain, sedation, movement, and side effects from sedation (oversedation). In this way the scale is designed to be simple and flexible enough to allow for comparisons among different providers, pharmacologic interventions, and nonpharmacologic techniques.

We present the method that was used to develop DOCS and the validation of its use as a tool in evaluating interventions used to decrease pain and anxiety. The scale is not meant for daily use but rather is intended for use in conjunction with video-taping to allow an objective comparison among techniques across a broad range of practice. Careful application and evaluation of video data with DOCS will allow the formulation of ideal procedural sedation/distraction interventions for procedures in a given institution or medical system.

Back to Top | Article Outline


The first step in developing DOCS was to model pediatric procedural sedation. The practice of pediatric sedation is a complex system that can be distilled down to a process (sedation) that requires control of variables to achieve optimal output. This model was used to describe patient conditions or states that occur over time during a procedural sedation.

We used techniques frequently applied in the field of human factors engineering (6–8). Complex systems such as pediatric sedation may be characterized by their inputs and outputs and by the transfer function that relates the two. Rather than a direct input-output model, modern control theory assumes that there is an intermediate variable—the state. The fundamental hypothesis of this approach is that detecting and defining state will allow better control, allowing one to avoid problem states and gravitate toward desirable states (8). When applying this approach in the context of pediatric sedation, patient state is the critical system variable. As operators, sedation providers take actions that are aimed at producing the goal state from baseline and correcting patients who enter problem states (screaming in pain or hypoxia from apnea or airway obstruction) to goal states (sleeping or other nonstressed conditions). To ensure a comprehensive model of patient states for pediatric sedation, four tasks were completed:

  1. The guidelines on pediatric sedation issued by the American Academy of Pediatrics and the American Society of Anesthesiology (9,10) were used to help us define possible levels of sedation (states) and standard monitoring arrays (components of control). They were also used as a starting point to define the standard time course of procedures in which sedation was given.
  2. Three board certified pediatrician/anesthesiologists who actively provide pediatric sedation were interviewed. The range of possible patient states was identified (i.e., obtunded, apneic, cyanotic, hypotensive, sleeping, calm awake, crying, thrashing, ans struggling vigorously). This range of patient states is recognized to be dynamic and changing over time. The experts also defined major control loops associated with the procedural sedation. These loops (which allow the operator to attain the goal state) range from giving sedative medications to stressed patients who are in pain to assisting airway patency in deeply sedated patients who may have obstructed airways. They also identified key milestones that are common to all procedural sedations. Specifically, these were the start time, ready for procedure time, procedure start time, procedure delay interval, procedure end time, ready for discharge time, and sedation end time.
  3. After IRB approval, 12 typical procedures requiring sedation (including magnetic resonance imaging, computed tomography, voiding cystourethrogram, lumbar puncture, cardiac catheterization, fracture reduction, and bone marrow biopsy) were video-taped. The three pediatrician/anesthesiologists reviewed these video tapes. During this task, the possible patient states were identified. Patterns of monitor output, provider behavior, and patient response that corresponded with the observed ranges of pain, anxiety, stress, movement, level of consciousness, and cardiorespiratory depression were cataloged.
  4. During this video data analysis, we found that states associated with dangerous conditions created by sedatives (oversedation) were rare. To define these states, we catalogued our experts’ opinions on this matter and also undertook an extensive literature review. One-hundred-ten papers published from 1996–2003 concerning procedural sedation for children were reviewed. Patient findings that were most often cited as adverse side effects of sedation included a decrease in oxygen saturation (most frequently at <92%), apnea or airway obstruction, and decreases in arterial blood pressure. These measures are included in 99% of the papers used in this review. To this we added the category of noisy respirations indicative of partial airway obstruction, which all of our experts felt was an indicator of possible impending airway difficulty.

The final working model of pediatric procedural sedation is presented in Figure 1. In this model, time is represented on the x axis, and patient state is displayed on the y axis. At time = 0, the patient (black circle) is in his/her preprocedural state not experiencing pain from the procedure or side effects from sedation. The patient is therefore (by definition) in the goal state. As time elapses, the patient is taken through the time frame of the procedure. During the procedure, there is potential for the patient to experience pain and anxiety that result in adverse behaviors, thus putting the patient at risk. Likewise, there is potential for the patient to experience unintended side effects of the sedation that is delivered (apnea or hypotension), which can also yield harm. All of these adverse effects are represented by the R factors in the model. Control loops, or C factors, represent the work performed by the health care providers attempting to counteract the possible adverse effects or accomplish a task. These interventions include giving sedation to an anxious child or providing jaw thrust to a child whose airway is obstructed. The objective for the sedation provider is to keep the patient in the goal zone (away from dangerous side effects) and deliver him/her back to the baseline state and ready for discharge.

Figure 1

Figure 1

A rating scheme was devised from the spectrum of patient states identified during development of the pediatric sedation model described above. Numerical values were assigned so that positive numbers reflected increasing activity states and negative numbers reflected decreasing activity states. The numbers were assigned to signify the balance desired between increasing activity associated with reaction to the procedure being performed, and effects of interventions (usually medications) were aimed at ameliorating pain and stress (thus decreasing activity). The resulting scale was named the Dartmouth Operative Conditions Scale (Table 1). The score for any patient at a particular moment in time was derived by summing the scores in all four categories to yield one numerical value describing the patient state. It is important to recognize that there is no connotation of adverse effect associated with negative numbers in this scale. Rather, the scale was designed to reflect the desire to balance patient activity and sedation intervention to yield neutral (acceptable) states for procedures.

Table 1

Table 1

Careful review of the video data versus summed DOCS scores revealed that the total DOCS scores could be divided into three general categories of patient states during the procedures: (a) under-controlled states with DOCS score of more than 2, correlating with patient behaviors such as crying, thrashing, and severe agitation; (b) safe states with DOCS scores of 2 through −2, correlating with patient behaviors that lacked any evidence of pain, stress, or side effects from sedation medications; and (c) oversedated states where the total DOCS score was less than −2, correlating with states in which the sedative and respiratory compromise associated with sedation medications were not adequately treated.

From the conception of this scale, we recognized the possibility for mixed states where a child might show signs of distress from the procedure (crying) but would also show signs of side effects from sedation (eyes closed or noisy breathing). We hypothesized that these states would be relatively rare and undertook this validation process (in part) to test the frequency and impact of these states.

After the design of the DOCS and IRB approval, another 95 procedural sedations were video-taped from the time sedation was administered to the time patients were considered to be back to their baseline consciousness. The camera was placed in the most unobtrusive location possible that also allowed visualization of the patient and monitors. No attempt was made to select out certain types of procedures, specific providers, medications, or types of patients. For this phase of the study, providers of sedation/anesthesia included radiology nurses, pediatric cardiologists, pediatricians, pediatric residents, pediatric oncologists, and anesthesiologists. These video tapes represented more than 356 h of sedation.

Out of this video tape data, 70 video clips of 30-s duration were created to give short segments that represented sedation practice as it would be viewed by providers using the DOCS with video tape data. The clips could not be absolutely randomly selected because we needed to assure that the general state or status of the patient did not change during the 30-s duration. In addition, we wanted to assure the inclusion of a range of patient states (calm awake states, agitated states, general anesthesia, deep sedation, moderate sedation, minimal sedation, and oversedation states). Because most of our video data included patients who were calm and sedated, the majority of randomly selected clips would have been of that state only. We therefore selected 20% of the video clip examples of this state and then continued to randomly select clips until all possible DOCS scores were included.

The video clips were selected by one investigator who had no knowledge of the development of the DOCS and no background in medical sedation (JJ).

The DOCS was tested with 10 health care providers to assess inter-rater reliability and again a year later to confirm intra-rater reliability. These health care providers were chosen to reflect the variety of providers who might use the scale including one nurse practitioner, two certified nurse anesthetists, two recovery room registered nurses, one medical student, one pediatric resident, one anesthesiology resident, and two attending physicians in anesthesiology. All raters were given a 5-min introduction to the use of the DOCS from the same investigator (GTB). They were then given two sample video clips with the rating done for them. At this point they were presented with the 25 video clips and asked to rate them using a computerized DOCS rating screen. The four categories for DOCS rating were arranged so that a point and mouse click entered the data in each category, and the cumulative score was automatically calculated. (Fig. 2). The clips were shuffled and presented to the rater a second time at the same sitting to evaluate the consistency with which the initial rating was performed. One year later, each of the raters was retested using the same 25 video clips and computer application and all clips were again shuffled to assure that the order of presentation would be different from the original rating experience. The data from the DOCS scoring of the video clips were analyzed in two ways: (a) raw score and (b) inclusion into under-sedated states, acceptable sedation states, and oversedation states, as outlined above.

Figure 2

Figure 2

DOCS scoring (both intra-rater and inter-rater) were compared using the kappa statistic (11). The kappa statistic is a measure of agreement. A kappa of zero would imply the amount of agreement results from chance alone, whereas a kappa of 1.0 suggests perfect agreement. For this study, a kappa more than 0.61, which implies substantial agreement, was selected as the minimum for acceptable agreement (12).

Criterion validity refers to the extent to which a measure relates to the other measures that would theoretically support the concept (or construct) being measured. It also should be measured whenever there is no universally accepted criterion. Because there are no scales designed to measure the same state characteristics during procedures as the DOCS, we searched other disciplines for a tool with similar end-points to the DOCS. We believed that purely sedation-oriented scales address only the level of consciousness and therefore are inappropriate for a DOCS comparison because DOCS is designed to measure acceptable conditions during a procedure regardless of sedation level. However, the COMFORT® scale (intended to measure the state of patients on ventilators in the intensive care unit [ICU]) is not based strictly on sedation level, but rather measures overall comfort of critically ill patients. It was therefore chosen as an appropriate comparison tool (4,5).

Because it is specifically designed for the ICU patient, some adjustments had to be made to the COMFORT® score to make it similar enough to the DOCS to enable its use for comparison purposes. The specific changes to the COMFORT® score included: (a) one component of the score is an evaluation of the patient’s comfort or cooperation while on a ventilator. This component was deleted for the purposes of this investigation because procedural sedation patients are not (generally) on ventilators. (b) Because of the relatively dynamic nature of procedural sedation practice, the 2-min observation time required by the COMFORT® score was not practical. Instead, for the purposes of this comparison, we had to rate the patient over a 30-s period.

The result of this alteration was a modified COMFORT® score that grades each of the following categories on a scale from 1–5: alertness, calmness or agitation, physical movement, muscle tone, facial tension, arterial blood pressure, and heart rate. The criteria used for rating in these categories were identical for that applied in the standard COMFORT® score.

Twenty separate comparisons were made between the DOCS and the COMFORT® scores based on 20 unique, randomly selected 30-s video clips from our video data set (using the same criteria outlined for the inter-rater reliability test). One investigator (JPC) was presented the video clips in random order and asked to perform a DOCS rating. The clips were then shuffled and represented for rating with the modified COMFORT® score. Comparison of the DOCS and COMFORT® scores were made using linear regression and Spearman rank-order coefficient. The raw DOCS score was used in this analysis and was compared with the sum total of the COMFORT® score.

Construct validity refers to the ability of a scale to measure what it is intended to measure. To perform this validation test, we scored 30 unique 30-s video clips that were selected (using the same criteria as mentioned for inter-rater reliability) at different times after the initial sedation/analgesia intervention was initiated. The clips were scored (using DOCS) by one rater who was blinded to the time after initiation of the sedation intervention. After completion of scoring, the results were cross-referenced with the time after the sedation intervention to determine how scores changed with time. Changes in scores revealing improved operative conditions with time would indicate that the DOCS measured the effect of the sedation interventions as desired. With extremes of time after sedation intervention, scores should return to baseline as the effect of the intervention wears off. The Wilcoxon signed rank test was used to evaluate the changes in scores measured at time of the induction compared with 20 min after the induction and at 20 min after the induction compared to 20 min after emergence.

Back to Top | Article Outline


As mentioned in Methods, one concern in the design of the DOCS scale was the possibility of mixed states that combined elements of under-sedation and oversedation. In fact, during the 357 h of video-taping, three instances of these states were found, and none lasted longer than 3 min. In all cases, these states involved a patient who was thrashing about and had his/her eyes closed. In each case, the score changed from a possible total of 5 to 4. As such, there were no cases where the patient’s categorization was changed from under-controlled to safe.

In the 25 paired assessments of video clips that represented sedation practice, the DOCS score was completed with excellent inter-rater reliability among the 10 health care providers (kappa = 0.84; se = 0.021; P < 0.001).

Ten health care providers completed the DOCS score for each of the 25 video clips on 2 separate occasions 1 yr apart. There were 250 scoring pairs, including 10 where the rater scored the same scenario differently. The DOCS score was completed with excellent intra-rater reliability over a 1-yr interval (kappa = 0.91; se = 0.051; P < 0.001).

The DOCS scores were observed to correlate well with the COMFORT® scores for the 20 video clips (Spearman correlation coefficient = 0.88; P < 0.001) (Fig. 3).

Figure 3

Figure 3

The distribution of DOCS scores 20 min after the anesthetic induction was significantly lower than the scores before the induction (P < 0.001). In addition, DOCS scores 20 min after emergence were significantly higher than DOCS scores 20 min after initiation of sedation/distraction intervention (Fig. 4).

Figure 4

Figure 4

Back to Top | Article Outline


Medical literature concerning pediatric sedation is characterized by studies of small numbers of patients that lack enough detail to allow comparison between one sedation strategy and another. Often the fact that a procedure is simply completed constitutes a success for any given case. Time to awaken from sedation or time to discharge from the hospital are sometimes, but not always, cited. Often there is no information as to whether the patient was relaxed and calm during the procedure or if the child was thrashing and kicking throughout. There is rarely information regarding whether the patient experienced any disturbance or delirium upon awakening from sedation. Without these types of detailed data, it is extremely difficult to compare the strategies of one provider in a setting (using a given technique) versus another. Data on the quality of sedation provided are simply lacking in many of the studies on this subject.

The data on safety related to pediatric procedural sedation are even more troublesome. Most sedation studies look at tens of patients, with very few including several hundred to one thousand patients (11–14). Data concerning oxygen desaturation and or the need for airway intervention in a cohort are sometimes (but not always) included. Most often, the data on these complications or near-critical events have been collected in a retrospective manner and depend on the charting of an individual at the time of the sedation without knowledge that the data would be used for study purposes. With few exceptions, the conclusion of any one of these studies is that the sedation technique studied is safe and effective because no serious morbidity or mortality occurred in the time frame of the study. If we accept that the rate of serious morbidity or mortality for pediatric sedation should be no more than that for general anesthesia (approximately 1/10,000), we are left with the fact that all of these studies are underpowered to conclude that a given technique meets this standard. More detailed analysis of the state of the patients who undergo a given sedation technique is required. Observation of subtle events that may well represent harbingers of unsafe practice would advance our ability to judge the true safety of one type of intervention versus another.

The DOCS was conceived to help with the above difficulties in evaluating sedation practice. The scoring system performed well in all aspects of the validation process. There was acceptable agreement in DOCS scores determined by different health care professionals with a spectrum of professional backgrounds and levels of experience. The DOCS scores determined by the same health care professional at two separate sessions a year apart also displayed acceptable agreement. Furthermore, the DOCS scoring system correlated closely with another similar scoring system, the COMFORT® score, confirming the criterion validity of the DOCS score. Construct validity was demonstrated by description of the change in DOCS score over time, with a return to presedation score after completion of the procedure. This validation process confirms that the DOCS scoring system can allow quantification of the conditions present during the procedure.

Whereas we do not suggest that this scale will be used on a daily basis, the use of video data collection and DOCS scoring will allow a more detailed analysis of sedation techniques. In this way, two techniques for bone marrow biopsy (i.e., ketamine versus midazolam) could be compared not only for “was the procedure completed?”, but also for what the child’s behavior, degree of movement, and pain was like during the procedure. This type of detail is critical in comparing how well sedation techniques work and will yield much more useful information than the studies that currently only give the most crude data of procedure completion.

The DOCS will also be helpful in assessing alternative sedation techniques. Previously published sedation scales do not allow comparison of the conditions produced through the use of sedation with those accomplished through distraction or other noninvasive techniques. Because documentation is focused on the sedation level, the nonsedation techniques (by definition) do not work. DOCS scoring allows comparison of the patient state during pediatric procedures regardless of the depth of sedation. For instance, a lumbar puncture performed with excellent distraction methods could be compared directly with one performed under deep sedation, with respect to comfort of the child and the operating environment during the procedure. It would also allow an investigator to identify potentially dangerous conditions present during the procedure. We hope that this type of direct quantifiable technique for comparison will lead to more consideration of these nonsedating interventions and promote those strategies that really work.

Pediatric sedation is delivered by a wide range of practitioners in a variety of practice settings. Comparing the effectiveness of work performed across such a broad spectrum has always been difficult because of the huge number of variables involved. In developing the DOCS, we used a human factors methodology and process control theory to distill the critical features of the work output and to model how any practitioner must approach the state feedback control challenge of procedural sedation. We strongly believe that the application of these concepts to sedation practice has allowed us to develop a scheme for thinking about and evaluating medical work that is unique and particularly helpful for areas of practice that are as difficult to evaluate as pediatric sedation practice.

Methodological difficulties with our study include the fact that our scale is attempting to measure a new outcome—that of patient state as opposed to sedation level. Whereas there are many pain and sedation scales, none of these is intended for the purposes of judging the same operating conditions that we were interested in investigating. In our evaluation of criterion validity, we were compelled to use a modified version of the frequently used Comfort® score for comparison. Whereas we feel this is not ideal, because this modification negates the proven validity of the Comfort® score, this modified scale actually comes quite close to measuring the same general patient state as DOCS. Another problem involves the possibility of mixed states, where a child could be both under-controlled and sedated at the same time. Our analysis reveals that this type of state is relatively rare and (in our experience) did not change the general state categorization of the patient.

Finally, we once again recognize that the methods described for using DOCS would not be practical on a day-to-day basis. The scale is actually intended to be used as a tool for research in conjunction with video-tape data collection. The scale would be ideal for use as an objective measure in comparing work among different sedation providers at a given institution or among providers at separate institutions. It could also be used to compare work by the same provider using two different methods for sedation. Appropriately applied, the scale could be used to compare efficiency of sedation (time to achieve sedation and the time required to return to baseline) and safety of techniques, even when a relatively small cohort is involved.

We have described a new scale for evaluating pediatric patients undergoing sedation for diagnostic and therapeutic procedures. This scale, DOCS, was shown to be a valid measure for qualifying the state of a patient during a sedation or distraction intervention.

Back to Top | Article Outline


1. Malviya S, Voepel-Lewis T, Tait AR, et al. Depth of sedation in children undergoing computed tomography: validity and reliability of the University of Michigan Sedation Scale (UMSS). Br J Anaesth 2002;88:241–5.
2. Chernik DA, Gillings D, Laine H, et al. Validity and reliability of the Observers Assessment of Alertness/Sedation Scale: study with intravenous midazolam. J Clin Psychopharmacol 1990;10:244–51.
3. Barker Ra, Nisbet HI. The objective measurement of sedation in children: a modified scoring system. Can Anaesth Soc J 1973;20:599–606.
4. Marx CM, Smith PG, Lowrie LH, et al. Optimal sedation of mechanically ventilated pediatric critical care patients. Crit Care Med 1994;22:163–70.
5. Ambuel B, Hamlett KW, Marx CM, et al. Assessing distress in pediatric intensive care environments: the COMFORT scale. J Pediatr Psychol 1992;17:95–109.
6. Mackenzie CF, Jeffries NJ, Hunter WA, et al. Comparison of self-reporting of deficiencies in airway management with video analysis of actual performance. Hum Factors 1996;38:623–35.
7. Mackenzie CF, Xiao Y. Video analysis for performance modeling in real environments: methods and lessons learned. In: Proceedings of the 43rd Annual Meeting of the Human Factors and Ergonomics Society, October 1999. Santa Monica, CA: 1999:237–41.
8. Sheridan TB. Supervisory control. In: Salvendy G, ed. Handbook of human factors and ergonomics. 2nd ed. New York: Wiley; 1997:1295–327.
9. American Academy of Pediatrics Committee on Drugs. Guidelines for monitoring and management of pediatric patients during and after sedation for diagnostic and therapeutic procedures. Pediatrics 1992;89:1110–5.
10. American Society of Anesthesiologists. Practice guidelines for sedation and analgesia by non-anesthesiologists. Anesthesiology 1996;84:459–71.
11. Fleiss JL, ed. Statistical methods for rates and proportions. 2nd ed. New York: John Wiley & Sons, 1981.
12. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977:33;159–74.
13. Pena BM, Krauss B. Adverse events of procedural sedation and analgesia in a pediatric emergency department. Ann Emerg Med 1999:34;483–91.
14. Havel CJ Jr, Strait RT, Hennes H. A clinical trial of propofol vs midazolam for procedural sedation in a pediatric emergency department. Acad Emerg Med 1999;6:989–97.
© 2005 International Anesthesia Research Society