Secondary Logo

Journal Logo

Airway management

A simplified risk score to predict difficult intubation: development and prospective evaluation in 3763 patients

Eberhart, Leopold HJ; Arndt, Christian; Aust, Hans-Jörg; Kranke, Peter; Zoremba, Martin; Morin, Astrid

Author Information
European Journal of Anaesthesiology: November 2010 - Volume 27 - Issue 11 - p 935-940
doi: 10.1097/EJA.0b013e328338883c
  • Free



Most approaches to the prediction of difficult intubation involve simple bedside physical examinations. In their review of the literature, Shiga et al.1 concluded that screening tests for difficult intubation when used alone have only poor to moderate discriminative power, whereas combinations of tests and/or risk factors add some diagnostic value; however, such combinations are little used in practice. For example, a survey of 110 anaesthesiologists in two German university hospitals showed that only 12 knew of the existence of the Wilson score2 and only one could calculate it; and only two knew of the Multivariate Risk Index of El-Ganzouri3 (unpublished data). Unfortunately, more recent approaches are even more complicated.4 This suggests the need for a simpler and easier composite tool. Thus, the aim of this study was to set up a practicable new predictive model and to validate it using a separate set of patients.

Patients and methods

We specified the following requirements for a predictive tool:

  1. No more than five risk factors.
  2. Assessment, including calculation, at the bedside should not take more than 5 min and no complex apparatus should be needed. A ruler and pocket light, but not a protractor, are acceptable.
  3. Only dichotomous criteria (e.g. risk factor present or not present) should be used. The score is equal to the total number of risk factors present for an individual patient.
  4. The predictive properties should be better than chance, and the discrimination power [area under a receiver operating characteristic (ROC) curve] should be increased by at least 20% (from 0.50 to 0.70) in an independent set of patients.
  5. There should be a correlation between the number of risk factors present and the likelihood of difficult intubation.

This prospective observational trial was approved by the institutional ethics committee of the University of Ulm. The data were collected in two university hospitals, and informed consent was obtained from each patient screened for eligibility. Consecutive patients undergoing a variety of surgical procedures requiring elective (nonrapid sequence induction) endotracheal intubation were included and there were no other formal exclusion criteria. In order to cover all the potential risk factors for difficult intubation, a systematic review of the literature was performed using the search terms – (‘difficult intubation’ OR ‘difficult laryngoscopy’ OR ‘difficult airway’) AND (‘predict*’ OR ‘risk model’ OR ‘risk factor’) – and other combinations of these terms. Table 1 summarizes the identified factors and the technique of measurement or recording used in these trials. The definitions described in the original publications were used for these risk factors, exceptions being indicated in Table 1. Basic biometric data (e.g. sex, age, weight, height) were also recorded for each patient. In addition to the ‘anatomical’ risk factors for a difficult airway, we also assessed ‘functional’ risk factors, for example ‘light level of anaesthesia’. To define such complex functional risk factors, the type and dosage of anaesthetic drugs used for induction of anaesthesia before the first attempt to secure the airway were recorded. The haemodynamic response to this manoeuvre (see Table 1 for details), assuming that a major increase in heart rate and blood pressure and/or movement associated with endotracheal intubation is a clinical sign of a ‘light’ level of anaesthesia, was also recorded.

Table 1
Table 1:
Potential risk factors for difficult intubation and the method of assessment used in each patient

Physical examination and recording of the potential risk factors was by one of four specially trained investigators not involved in the operative care of the patients. The results of these evaluations were not available for clinicians performing the endotracheal intubation and assessing the ease or difficulty of airway management. Any peculiarities during the procedure were recorded on a specially designed case record form. General anaesthesia was induced according to the local standard operating procedures using propofol [mainly in the patients assessed as being American Society of Anesthesiologists (ASA) class I–III; mean dose 2.1 mg kg−1] or etomidate (mainly in patients assessed as being ASA class IV; mean dose 0.18 mg kg−1), fentanyl (mean dose 3.0 μg kg−1) or sufentanil (mean dose 0.26 μg kg−1), and vecuronium, rocuronium or cisatracurium administered after mask ventilation was successfully established (mean doses: 0.11, 0.56 and 0.12 mg kg−1, respectively). A total of 47 anaesthesiologists with varying levels of clinical experience (at least 1 year of continuous clinical training with >500 endotracheal intubations, range 1–37 years; see also Table 1) performed the intubations using a Macintosh laryngoscope with a 3 or 4 blade, depending on personal preference.

Definitions of the outcome criteria

Difficult intubation was defined as intubation requiring additional technical support (e.g. fibre-optic device, intubating laryngeal mask, etc.) or human resources (intubation performed by a second anaesthesiologist with a total of three or more attempts), or a total time to successful intubation of more than 10 min.

Statistical analyses

Data were collected from 3763 patients. Using computer generated random numbers, this dataset was randomly split in a 2: 1 ratio into a training dataset (n = 2509) and a validation dataset (n = 1254) based on the results of a sample size analysis (see below). Patients from the training dataset were used to create the risk model. Univariate comparisons between outcome groups were calculated for each variable within the training dataset (χ2 or Fisher's exact tests for nominal or dichotomous variables and Mann–Whitney U-tests for continuous variables). Variables with a two-sided nominal P value of less than 0.2 in either of the analyses were defined as potentially relevant risk factors and were further investigated jointly with a logistic regression framework. A stepwise mixed logistic regression analysis was used to develop the final prediction models for difficult intubation. It was necessary to dichotomize continuous data in order to achieve the predefined goal of creating a simplified predictive model. For this purpose, the optimal cut-off value for dichotomization was calculated using discriminant analysis. This value was verified using the graphical tools provided with the JMP statistical package (JMP 6.0; SAS Institute Inc., Cary, North Carolina, USA) and then a suitable even number was chosen as a practical cut-off point. The goodness of fit of the regression model was judged by using Nagelkerke's r2.

The final prediction model was validated by using automated variable selection methods in SAS (version 8.2; PROC LOGISTIC SELECTION = BACKWARD | FORWARD | STEPWISE | SCORE). The goodness of fit of a model was judged using Akaike's information criterion. Potential interactions between the independent variables were analysed using the graphical tools provided by the JMP statistical software package. Deviations from additivity were explored by the Hosmer and Lemeshow procedure.

Validation of the model: discriminating power

The variables included in the models were used to calculate the probability of difficult intubation for each patient of the validation dataset. The discriminating properties of the predictive model were judged by calculating the area under a ROC curve, which was constructed by correlating true-positive and false-positive rates (‘sensitivity’ plotted against ‘1 minus specificity’) for a series of cut-off points defining the predicted risk. The area under the curve (AUC) represents the probability that a patient presenting with difficult intubation has a higher score value than one with an easy airway.6 A 45° bisector would yield a prediction score that was no better than a random guess; the area under this ‘random score’ is 0.5. A score performing significantly better than chance has an area under the ROC curve greater than 0.5 with the lower limit of the 95% confidence interval (CI) exceeding 0.5.6

First, an ‘exact’ probability was calculated using the equation

where z is a linear equation that can be calculated using the constant and the β-coefficients derived from the results of the final model of the logistic regression analysis. However, this ‘exact’ method of calculation of the likelihood did not predict difficult intubation significantly better than the simplified risk model (in which only the number of risk factors is counted and matched to the corresponding risk).

Validation of the model: calibration properties

Calibration was judged by plotting predicted incidences derived from the training dataset after applying the simplified risk score against the actual incidences in the validation dataset for each of the risk groups. Calibration characteristics were expressed as the slope (ideal: 1.0) and the offset (ideal: no offset) of the regression line created after calculating a weighted linear regression (JMP 6.0 statistical package).

Sample size calculation

Based on the literature, we assumed a 3% incidence of difficult intubation. Thus, if 1250 patients were used for validation of the risk model, the power to achieve an area under a ROC curve of 0.65 or higher (two-sided alternative hypothesis) for the prediction of ‘difficult intubation’ would be approximately 89%. Power analysis was performed using PASS 2002 (Number Cruncher Statistical Systems, Kaysville, Utah, USA). Thus, it was decided to split the dataset of 3763 patients by 2: 1 into the training dataset and the validation dataset.


The incidence of difficult intubation in the whole dataset of 3763 patients was 3.7% (95% CI 3.2–4.4%), and the incidences were similar in the two subsets. Twenty-four potentially relevant risk factors were identified in the univariate analysis and subjected conjointly to a logistic regression procedure. Table 2 shows the results of these analyses, indicating at which step a given potential risk factor was removed from the model. Nagelkerke's r2, the measure of the goodness of fit of the regression model, is listed for each step. One variable of the final set of five risk factors had to be dichotomized for the sake of simplicity. According to the predefined approach for dichotomization, the cut-off value for ‘mouth opening’ was set at 4 cm (optimal mathematical cut-off value 3.7 cm). Finally, the ‘protective’ factor ‘lack of front teeth’ was converted into the corresponding ‘risk’ factor ‘presence of front teeth’, requiring an altered constant in the regression equation but having no effect on the general performance of the predictive model.

Table 2
Table 2:
Summary of the backward logistic regression procedure to identify independent risk factors associated with a difficult intubation

Table 3 shows the final model, consisting of five dichotomous risk factors with odds ratios 1.80–3.61. The 95% CIs of the odds ratios, the P values of each risk factor and the β-coefficients of the risk factors (with standard errors) are presented. The model uses the presence of the upper front teeth, mouth opening, the history of a difficult intubation and the Mallampati status. A Mallampati score greater than 1 was identified as an independent predictor increasing the risk for a difficult intubation by odds of 2.55. Further, a Mallampati classification of ‘4’ (soft palate not visible) also gave an increased risk (odds ratio: 1.91). After omission of the relative importance (expressed by the β-coefficients) of the five risk factors, the simplified model, in which only the number of risk factors present in an individual patient needs to be counted, was finalized. If no risk factor is present, for example edentulous with respect to the eight upper front teeth, good mouth opening, Mallampati ‘1’ and no history of difficult intubation, there is almost zero risk of difficult intubation. As no difficult laryngoscopy occurred in any of the 206 patients of the training dataset with a zero score, the predicted risk is 0% (95% CI 0–1.44%). Table 4 presents the frequency of the risk factors and the actual and observed incidences of difficult intubation in the validation dataset. The predicted risk increases from 0 to 2, 4 and 8%, when none, one, two or three factors, respectively, are present; predicted risk is 17% when four or five factors are present. The last two risk classes were grouped together because of their relatively low incidence. Thus, there is at least double the risk of difficult intubation (above the average risk of 3.7% in the present dataset) when three or more risk factors are present and a four-fold increase when four or five risk factors are present.

Table 3
Table 3:
Final model of the predictive factors for difficult intubation with their β ± standard error,P values and odds ratio (including the 95% confidence intervals)
Table 4
Table 4:
Predicted and actual incidences of difficult intubation in thetraining and validation datasets related to the number of risk factors of the simplified predictive model

Validation of the risk scores

Validation was performed in an independent set of 1254 patients. The AUC of the ROC curve was 0.71 (95% CI 0.63–0.79). Interestingly, the discriminating power of the simplified (unweighted) model was 0.72 (95% CI 0.63–0.81). As the lower boundary of the 95% CIs of the AUC values was greater than 0.5 (pre-hoc probability of a test to correctly predict the disease state of a sample pair of patients with one patient having the disease and the other one not), clearly both predictive tools perform significantly better than a random guess.

The calibration properties are presented in Fig. 1, in which the actual incidences of a difficult intubation event are plotted against the predicted values. The weighted regression of the data indicates a slope of the regression line of 1.11 (1.0 would be the ideal value) with a minimal offset.

Fig. 1
Fig. 1


Unanticipated difficult tracheal intubation constitutes a major problem for anaesthesiologists, intensivists and emergency room staff. The incidence in the operating room has been reported as 1–18%.2,7,8 A recent systematic review of bedside screening to predict difficult intubation found an incidence of 5.8% (95% CI 4.5–7.5%).1 The incidence of abandoned/failed intubation is approximately 0.05–0.35%5,9 and up to 600 patients are thought to die annually worldwide as a result of complicated tracheal intubation.10 Difficult intubation accounted for 17% of adverse respiratory events in an ASA closed-claims analysis, in 85% of which the outcome was either death or brain damage.11 Thus, the search continues for an applicable, reliable and accurate predictive test.

Screening tests include the Mallampati oropharyngeal classification, thyromental and sternomental distances and mouth opening, but all yield poor-to-moderate sensitivity (20–62%) and moderate-to-fair specificity (82–97%). In the present dataset of 1254 patients, the area under the ROC curve for the Mallampati status alone was 0.63, and thyromental distance was only 0.56. These examples demonstrate that a single bedside test is of no great value.1

Several of the existing multivariate models use a combination of easy methods. The Wilson score was the first predictive tool using weight, the ability to extend the neck, jaw movement and the presence and severity of a receding mandible and buck teeth.2 The Arné model12 requires a history of a previously difficult intubation, diseases associated with difficult intubation, clinical signs of airway disease, the interincisor gap and luxation of the mandible, head and neck movement and the Mallampati classification. Depending on the severity of the findings, up to 13 items are added to result in a risk score ranging from 0 to 48 points. With the Naguib score,13 the formula is: 4.9504 + (thyrosternal distance × 1.1003) − (Mallampati score × 2.6076) + (thyromental distance × 0.9684) − (neck circumference × 0.3966). These three scores were compared in a case–control study in which a highly selected group of 97 patients presenting with an unanticipated difficult intubation within a 5-year period were compared with a matched control group of patients with an easy intubation.

The AUCs measuring the discriminating power of the Wilson, Arne and Naguib models were 0.79 (95% CI 0.72–0.85), 0.87 (95% CI 0.82–0.92) and 0.82 (95% CI 0.76–0.88), respectively, at first glance more promising than the validation of our risk model (AUC = 0.72). However, our study included consecutive patients, whereas the validation of Naguib et al. used a case–control design, whereby only patients with an obviously extremely difficult intubation were matched to an easily intubated control group, and only anaesthesiologists with more than 5 years posttraining experience participated. Obviously, it is easier to discriminate patients with a very clear and definite disease state from ‘normal’ patients than to discriminate within a nonselected and heterogeneous study population, as was the case in our study.

Our analysis was restricted to patients tracheally intubated by anaesthesiologists with at least 1 year of clinical experience; in all, 25 colleagues with clinical experience between 1 and 35 years. We believe that the inclusion of anaesthesiologists from different institutions is not a drawback, rather it makes our conclusions more robust, and our proposed simplified risk score should be more relevant to the general anaesthesiologist.14 This assumption needs to be tested by further studies. It is questionable whether very experienced anaesthesiologists will benefit from routine assessment of the airway, given the limited value of predictive tools in clinical situations wherein there are no strong risk factors15 and wherein the incidence of the predicted event is low.16

Given that none of the current risk scores using anatomical parameters has good predictive properties, one might speculate that difficult intubation is not only a matter of ‘pathological anatomy’. The level of anaesthesia, degree of relaxation and choice of anaesthetic drugs modify intubating conditions.17 A composite variable (‘light level of anaesthesia’) based on patient movement during or after tracheal intubation and an increase in either heart rate or blood pressure of more than 25% was removed at a late stage of the stepwise logistic regression analysis. This suggests light anaesthesia may have a minor role to play.

Our new score has only moderate discriminating power, but does have the advantage of easy applicability; there are no coefficients (as in the Naguib score) or relative factor weights (as in the Wilson score or the Arne model). Four simple risk factors, all part of the routine preoperative examination, and one simple question are all that is needed. By assessing the Mallampati status of the patient, the investigator automatically evaluates mouth opening and can check the presence of the upper front teeth. Scoring is completed by questioning the patient whether there has been any difficulty with the airway during previous anaesthesia.

Anaesthesiologists should be aware of the imperfection of the models in that discrimination power (the likelihood that a patient with, for example, a difficult endotracheal intubation has a higher risk score than a patient with an easy intubation) is only increased by 20 percentage points, from the pre-hoc likelihood for a correct prediction of 50% to about 70%. When using a cut-off level of three or more risk factors, 120 patients of the validation dataset (9.6%) were expected to present with a difficult airway, but only 12 actually proved to be difficult. Using only the Mallampati ‘4’ classification, 12 patients were detected with a difficult intubation with only a slightly higher rate of false-positive predictions (108 with the new score and 135 with the Mallampati class 4). On the other hand, there were also many false negatives with these scores: five Mallampati ‘1’ patients were difficult to intubate. For the lowest risk class of our new scoring system, no patient presented with a difficult airway, but there were only 77 patients in this group.

Our simplified scoring system for the prediction of difficult intubation meets all five predefined criteria. With four easy to assess clinical signs (Mallampati status, mouth opening, presence of teeth and a history of a difficult airway in the past), this simplified 5-point score is calculated within a few seconds without the aid of technical apparatus. It is significantly better than a random guess and achieves a discriminating power of more than 70% in an independent set of patients.


1 Shiga T, Wajima Z, Inoue T, Sakamoto A. Predicting difficult intubation in apparently normal patients. A meta-analysis of bedside screening test performance. Anesthesiology 2005; 103:429–437.
2 Wilson ME, Spiegelthaler D, Robertson JA, Lesser P. Predicting difficult intubation. Br J Anaesth 1988; 61:211–216.
3 El-Ganzouri AR, Mc-Carthy RJ, Tuman KJ, et al. Preoperative airway assessment: predictive value of a multivariate risk index. Anesth Analg 1996; 82:1197–1204.
4 L'Hermite J, Nouvellon E, Cuvillon P, et al. The simplified predictive intubation difficulty score: a new weighted score for difficult airway assessment. Eur J Anaesthesiol 2009; 26:1003–1009.
5 Samsoon GLT, Young JRB. Difficult tracheal intubation: a retrospective study. Anaesthesia 1987; 42:487–490.
6 Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982; 143:29–36.
7 Benumof JL. Management of the difficult airway: with special emphasis on awake tracheal intubation. Anesthesiology 1991; 75:1087–1110.
8 Tse JC, Rimm EB, Hussain A. Predicting difficult endotracheal intubation in surgical patients scheduled for general anesthesia: a prospective blind study. Anesth Analg 1995; 81:254–258.
9 Cormack RS, Lehane J. Difficult tracheal intubation in obstetrics. Anaesthesia 1984; 39:1105–1111.
10 King TA, Adams AP. Failed tracheal intubation. Br J Anaesth 1990; 65:400–414.
11 Caplan RA, Posner KL, Ward RJ, Cheney FW. Adverse respiratory events in anesthesia: a closed claims analysis. Anesthesiology 1990; 72:828–833.
12 Arné J, Descoins P, Fusciardi J, et al. Preoperative assessment for difficult intubation in general and ENT surgery: predictive value of a clinical multivariate risk index. Br J Anaesth 1998; 80:140–146.
13 Naguib M, Scamman FL, O'Sullivan C, et al. Predictive performance of three multivariate difficult tracheal intubation models: a double-blind, case-controlled study. Anesth Analg 2006; 102:818–824.
14 Smith RL. Observational studies and predictive models. Anesth Analg 1990; 70:235–239.
15 Apfel CC, Kranke P, Greim CA, Roewer N. What can be expected from risk scores for predicting postoperative nausea and vomiting? Br J Anaesth 2001; 86:822–827.
16 Yentis SM. Predicting difficult intubation: worthwhile exercise or pointless ritual? Anaesthesia 2002; 57:105–109.
17 Lieutaud T, Billard V, Khalaf H, Debaene B. Muscle relaxation and increasing doses of propofol improve intubating conditions. Can J Anaesth 2003; 50:121–126.

airway; diagnostic; difficult intubation; intratracheal; intubation; predictive test; screening test; test

© 2010 European Society of Anaesthesiology