Because this new airway model describes appearance, it is possible to generate pictures of faces that would appear to have certain degrees of ease or difficulty of intubation. Figure 5 A illustrates the head that is theoretically most difficult to intubate according to the model. Figure 5B represents a head that the model would classify as easy to intubate. The parameter values for this head are set such that the value produced by the model is of the same magnitude but opposite to Figure 5A. Figure 5B might therefore be considered to represent a patient as easy to intubate as the patient in Figure 5A would be difficult.
In our study, computerized facial structure analysis combined with a widely used bedside airway evaluation method yielded a model that significantly outperformed popular clinical predictive tests. Our model accurately classified 70 of 80 airways compared with 47 of 80 for MP test plus TMD using classical thresholds.1,3,5
Use of a bedside examination to predict difficult intubation is considered the standard of care in modern anesthesiology practice. It has been incorporated into the difficult airway algorithm of not only the American Society of Anesthesiologists7 and those of several other countries,20 but also most recently into the World Health Organization Surgical Safety Checklist,21 the use of which is being encouraged in every operating room in the world. Unfortunately, all easily performed examination systems in clinical practice perform only modestly, with sensitivities of 20% to 62%, specificities of 82% to 97%, and very low positive predictive values, generally <30%, unless very liberal definitions of difficulty are used.22 There are likely a number of reasons for this poor performance, including the relative rarity of difficult intubation,22 the multifactorial etiology and varying definition of difficult intubation, interobserver variability in test results,23,24 failure to validate potential systems in patients independent of those used to derive the test,22 and the inadequacy of the tests themselves. Conversely, experienced anesthesiologists almost certainly use cues other than those derived from formal bedside tests to formulate their clinical impression of the ease of intubating any given patient. There may be a large number of anatomic factors that enter into such a judgment.25 However, bedside scores based on such factors have not proven to be accurate.4 Indeed, getting anesthesiologists to pay attention to the airway may be the principal benefit of routinely performing airway examinations before induction.22
Our study differs from previous work using facial imaging to evaluate the airway. Suzuki et al.26 used digital photographs of subjects' faces to calculate 5 ratios and angles from measurements derived from placement of anatomic markers on the photographs. They found one, the “submandibular angle,” to be correlated with difficult tracheal intubation. They also used morphing software to construct “average” easy and difficult to intubate faces, which we believe bear some subjective resemblance to our Figure 5. Similarly, Naguib et al.27 measured 22 indices from plain radiographs and 8 from 3-dimensional computed tomographic scans of the head in patients who were easy or difficult to intubate. They constructed a model containing 3 bedside tests (MP test, TMD, and thyrosternal distance) and 2 radiographic features that accurately separated the easy and difficult cohorts with an AUC of the ROC curve of 0.97. Both of these previous investigations, however, used a priori assumptions of which anatomic features might relate to difficult laryngoscopy and intubation. Both also required actual measurement of anatomic features. In contrast, our method modeled the entire physiognomy of the face with no such assumptions and no direct measurements. Moreover, the method does not require time-consuming and potentially dangerous radiographs. If implemented on high-speed computers, perhaps accessed by end users transmitting images of patients over a network, our model could be used for rapid bedside or field assessment of the airway, even by inexperienced practitioners.
Our study has several limitations. First, it is likely that there are causes of difficult intubation not included in our study cohorts. For example, some patients with limited neck mobility but otherwise normal airways are difficult to intubate.28 Further refinement of the model could include subjective or measured indices of neck extension, or indeed other predictors of difficult intubation such as body mass index. Second, we measured TMD in the neutral position rather than full neck extension, and we used fingerbreadths rather than measured distance. This is the method in routine use in our institution but some evidence suggests measuring TMD in extension is more predictive of intubation difficulty29 and, although popular, fingerbreadth measurements are inferior to ruler measurements.9 Conversely, if the model included this potentially inferior measurement, refining it by using full-extension TMD could only improve performance of the model. Third, we sought to eliminate potential racial or gender-based confounding by confining our sample to Caucasian males. Finally, because photographs were obtained postoperatively, we cannot entirely exclude the possibility of changes in facial appearance caused by anesthesia and surgery. Only a large, prospective study in a diverse patient population would be able to verify the performance of our model in general clinical use. It is encouraging that the model predicts ease in the computerized normal face, and that it performs better than bedside tests within the study cohorts. It is also possible that deriving and validating it within a larger fraction of the difficult airway “space” could further refine the model.
Another potential limitation is the method used to categorize the subjects as easy or difficult to intubate. We used a liberal definition of difficult intubation, which causes the positive predictive value to increase. In real-world clinical use, anesthesiologists are likely more interested in very difficult intubation, and the positive predictive value will be lower, as it is for all difficult airway predictive methods. Conversely, use of a liberal definition makes the statistical task of separating the easy and difficult cohorts more difficult, not less so,22 because the 2 groups of patients are more similar. This makes the strong performance of our model notable. Moreover, the model performed even better in the subset of patients with an IDS score >5, who had comparatively more difficult intubations. However, it is decidedly problematic to infer the comparative difficulty of an intubation from the after-the-fact description of the technical maneuvers required to manage that airway.6 For example, would an intubation achieved over a bougie on the third attempt be considered more or less difficult than one in which the anesthesiologist decided to use a video technique after the first unsuccessful attempt? Even presuming equally experienced laryngoscopists, the comparison is confounded by differences in comfort with, and availability of, other adjunct techniques. This ambiguity also complicates the use of research tools such as IDS score, which, for example, could be low (and thus descriptive of an easy intubation) in a patient in whom a single direct laryngoscopy produced a poor view and who was then intubated fiberoptically.
The clinical utility of our methodology and model remains an important research question. First, technical issues would need to be solved. The software currently requires approximately 15 minutes to model each face from digital photographs. It relies presently on a relatively inefficient iterative algorithm to do so, and exerts considerable computing power on modeling the coloration and texture of the skin. Certainly, a more efficient one could be written, particularly if only a few parameters need to be derived to predict difficult intubation. Indeed, we have a prototype algorithm that can analyze a face, derive the relevant parameters, and calculate the intubation prediction in less than 1 minute (data not shown). If proven practical for widespread clinical use, this would represent a significant advance over previously published methods involving offline measurements taken from radiographs or photographs. Second, the computing power required is modest but exceeds that of current handheld devices. We envision that clinical use of our model would be most efficiently deployed using high-speed computers accessible to clinicians over a network, perhaps using handheld computers or smartphones incorporating digital cameras as input devices. Third, the requirement for manual placement of fiducial markers to guide reconstruction is a potential source of user error. However, it is encouraging that our test-retest results revealed no cases in which the overall judgment of ease or difficulty of intubation varied. Finally, the performance of the model should be compared with that of experienced clinicians given similar data. The model would be particularly useful if it could predict difficult intubation when an experienced clinician had not suspected it, a more dangerous clinical situation than the converse error of judgment. If the model outperforms human experts, then its applicability would potentially be quite broad and would include even seasoned anesthesiologists. Conversely, if human operators can match the model's performance, then the software may be of greater utility to nonairway experts. This assessment is an area of active research by our group.
In summary, the model presented herein significantly outperformed the current standard of the combination of MP and TMD examinations, and is based on quantification of facial anatomy performed by an unbiased computer algorithm. Additional work should define the ability of experienced clinicians presented with similar photographs and bedside airway examination results, and the ability of the computer model to prospectively predict difficult intubation in a large and diverse patient population. If the superiority of the method can be confirmed, the model could represent an important advance in the assessment of the airway.
1. Mallampati SR, Gatt SP, Gugino LD, Desai SP, Waraksa B, Freiberger D, Liu PL. A clinical sign to predict difficult tracheal intubation: a prospective study. Can Anaesth Soc J 1985; 32:429–34
2. Samsoon GL, Young JR. Difficult tracheal intubation: a retrospective study. Anaesthesia 1987;42:487–90
3. Frerk CM. Predicting difficult intubation. Anaesthesia 1991; 46:1005–8
4. Shiga T, Wajima Z, Inoue T, Sakamoto A. Predicting difficult intubation in apparently normal patients: a meta-analysis of bedside screening test performance. Anesthesiology 2005;103:429–37
5. Cormack RS, Lehane J. Difficult tracheal intubation in obstetrics. Anaesthesia 1984;39:1105–11
6. Crosby ET, Cooper RM, Douglas MJ, Doyle DJ, Hung OR, Labrecque P, Muir H, Murphy MF, Preston RP, Rose DK, Roy L. The unanticipated difficult airway with recommendations for management. Can J Anaesth 1998;45:757–76
7. American Society of Anesthesiologists Task Force on Management of the Difficult Airway. Practice guidelines for management of the difficult airway: an updated report by the American Society of Anesthesiologists Task Force on Management of the Difficult Airway. Anesthesiology 2003;98:1269–77
8. Adnet F, Borron SW, Racine SX, Clemessy JL, Fournier JL, Plaisance P, Lapandry C. The intubation difficulty scale (IDS): proposal and evaluation of a new score characterizing the complexity of endotracheal intubation. Anesthesiology 1997;87: 1290–7
9. Baker PA, Depuydt A, Thompson JM. Thyromental distance measurement: fingers don't rule. Anaesthesia 2009;64:878–82
10. Turk M, Pentland A. Eigenfaces for recognition. J Cogn Neurosci 1991;3:71–86
11. Valentine T. A unified account of the effects of distinctiveness, inversion, and race in face recognition. Q J Exp Psychol A 1991;43:161–204
12. Blanz V, Vetter T. A morphable model for the synthesis of 3D faces. SIGGRAPH'99. Proceedings of the 26th annual conference on computer graphics and interactive techniques. New York: ACM Press/Addison-Wesley, 1999:187–94
13. Blanz V, Vetter T. Face recognition based on fitting a 3D morphable model. IEEE Trans Patt Anal Machine Intell 2003;25:1063–74
14. Chen TG, Fels S. Exploring gradient-based face navigation interfaces. Graphics Interface 2004. ACM International Conference Proceedings Series. Ontario: Canadian Human-Computer Communications Society, 2004;62:65–72
15. Hosmer DW, Hosmer T, Le CS, Lemeshow S. A comparison of goodness-of-fit tests for the logistic regression model. Stat Med 1997;16:965–80
16. Weisberg S. Applied Linear Regression. 3rd ed. Hoboken, NJ: Wiley-Interscience, 2005
17. Hosmer DW, Lemeshow S. Applied Logistic Regression. 2nd ed. New York: Wiley, 2000
18. Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982;143:29–36
19. Huang J, Lin CX. Using AUC and accuracy in evaluating learning algorithms. IEEE Trans Knowl Data Eng 2005;17:299–310
20. Frova G, Sorbello M. Algorithms for difficult airway management: a review. Minerva Anestesiol 2009;75:201–9
21. Haynes AB, Weiser TG, Berry WR, Lipsitz SR, Breizat AH, Dellinger EP, Herbosa T, Joseph S, Kibatala PL, Lapitan MC, Merry AF, Moorthy K, Reznick RK, Taylor B, Gawande AA. A surgical safety checklist to reduce morbidity and mortality in a global population. N Engl J Med 2009;360:491–9
22. Yentis SM. Predicting difficult intubation: worthwhile exercise or pointless ritual? Anaesthesia 2002;57:105–9
23. Wilson ME, John R. Problems with the Mallampati sign. Anaesthesia 1990;45:486–7
24. Karkouti K, Rose DK, Ferris LE, Wigglesworth DF, Meisami-Fard T, Lee H. Inter-observer reliability of ten tests used for predicting difficult tracheal intubation. Can J Anaesth 1996; 43:554–9
25. Wilson ME, Spiegelhalter D, Robertson JA, Lesser P. Predicting difficult intubation. Br J Anaesth 1988;61:211–6
26. Suzuki N, Isono S, Ishikawa T, Kitamura Y, Takai Y, Nishino T. Submandible angle in nonobese patients with difficult tracheal intubation. Anesthesiology 2007;106:916–23
27. Naguib M, Malabarey T, AlSatli RA, Al Damegh S, Samarkandi AH. Predictive models for difficult laryngoscopy and intubation: a clinical, radiologic and three-dimensional computer imaging study. Can J Anaesth 1999;46:748–59
28. Santoni BG, Hindman BJ, Puttlitz CM, Weeks JB, Johnson N, Maktabi MA, Todd MM. Manual in-line stabilization increases pressures applied by the laryngoscope blade during direct laryngoscopy and orotracheal intubation. Anesthesiology 2009;110:24–31
29. Rosenstock C, Gillesberg I, Gatke MR, Levin D, Kristensen MS, Rasmussen LS. Inter-observer agreement of tests used for prediction of difficult laryngoscopy/tracheal intubation. Acta Anaesthesiol Scand 2005;49:1057–62
Appendix A: A Real-World Representation of the Quadratic Logit
The quadratic logit function admits both the value of a parameter and its square:
Completing the squares, we can equivalently write:
where ζi takes only the values ±1, without which σi would be imaginary if βi < 0.
Note that in equation (A1), the term z becomes the exponent. It is useful to recall the probability distribution function of the normal distribution:
The similarity between the 2 equations suggests that an alternative, probabilistic, interpretation of the quadratic logit function is possible. It can be shown that the output of the logit function L(z) can be written as a weighting of normal probability distributions of the inputs xi:
where C is a numerical constant determined by the model, defined as:
Equation (A5) is a curious result, because we believe it mathematically describes a mental process similar to that performed by anesthesiologists given the task of assessing an unknown airway. The combination of factors favorable to intubation are weighed against the countervailing unfavorable factors and, based on their relative preponderance, a decision is reached with some greater or lesser degree of confidence. The apparent artificiality of the quadratic logit model thus leads to a surprisingly natural result.
Appendix B: Cross-Validation by Product of Area Under the Curve
We measure the performance of a model by the area under the curve (AUC) on its receiver operating characteristic plot. A cross-validation and model selection technique is required that can relatively suppress models that show evidence of overfitting.
Let us suppose that some ideal, optimal model exists, and that this model has an AUC of AUCideal. It will likely be possible to produce some other model that seems to perform better on the model derivation set. However, because this model is by definition not the ideal model, its performance must have been artificially improved by overfitting. Let us define the AUC of this model as AUCideal + ε0, where ε0 represents the performance erroneously obtained through overfitting.
Now, consider the model validation dataset. We would expect that the ideal model would have an AUC of AUCideal when tested against either dataset. If the datasets are sufficiently large, then we can expect that any improvement in performance that was erroneously obtained by overfitting in the derivation set will appear as an equal penalty in the validation set. Therefore, the model with an AUC of AUCideal + ε0 in the derivation set should have an AUC of approximately AUCideal − ε0 in the validation set. The effect of ε0 must be symmetric in this way because if this were not so it would imply that some residual information is available that could be used to improve AUCideal, contradicting the initial statement that AUCideal is the optimal model.
Although we cannot know the values of either AUCideal or ε0, we can use them as the basis for selecting the best candidate models by maximizing the product of the AUCs for the derivation and validation set, i.e.:
The value of equation (B3) is maximized only when ε0 = 0, describing no overfitting. The candidate model that generates the greatest AUC product as defined by equation (B3) is therefore likely to be the model that most closely approximates the theoretically ideal model and has the least overfitting.
Appendix C: Mathematical Interpretation of the Final Logistic Model
The parameters of the selected airway classification model are given in the following table:
The terms in the model are likewise defined as:
The value of L(z) is always within the range of 0 to 1 and is the predicted likelihood of belonging to class 1. The value of 1 − L(z) is the predicted likelihood of belonging to class 0. Therefore, if L(z) is ≤0.5, then the patient is predicted as class 0 (easy to intubate) and if L(z) is >0.5, then the patient is predicted as class 1 (difficult to intubate). The meanings of the parameters of the model are defined fully in Appendix A, but can also be described simply. In the quadratic logit model, the α terms identify the apex of the quadratic curve, and the σ terms represent the steepness of the sides of the curve. The variable ζ defines whether ease of intubation improves (+1) or worsens (−1) as the value of the variable moves away from α. As ζ = +1 for all terms, β0 describes the value in logit units that would be produced by the head that is most difficult to intubate according to the model, as shown in Figure 5A.
When the derivation and validation data contain such a high prevalence of difficult intubations, one might be suspicious that an algorithm produced from that data might overcall the prevalence of difficult intubation in the general population. We can address this concern by calculating the predicted difficulty of the average head, to which the model had not previously been exposed. The average head (Fig. 2) is defined as the head for which all observable parameters have 0 deviance from the population normal,16 and hence for which all the values of x for observable parameters in the model are 0. Assigning a thyromental distance of 4 fingerbreadths, we calculate z = −2.60, and therefore L(z) for the average face is 0.069, which suggests a likelihood of 93.1% that the average head will be easy to intubate.
Furthermore, the meaning of the value L(z) returned by the model is unclear beyond its definition as a binomial classifier above and below L(z) = 0.5. It is tempting but untested to conclude that the magnitude of L(z) predicts the degree of difficulty. This would be to impose a further level of structural meaning, to say that those points that lie to the upper right of the distribution in the logit plots of Figure 4 represent not just difficult intubations but instead represent intubations comparatively “more difficult” than those represented by points lying closer to the center. The present investigation cannot address this intriguing possibility, and the difficulty in testing it against agreed upon clinical definitions will complicate future attempts to do so.