Transnasal Videoendoscopy for Preoperative Airway Risk Stratification: Development and Validation of a Multivariable Risk Prediction Model

BACKGROUND: Transnasal flexible videoendoscopy (TVE) of the larynx is a standard of care for the detection and staging of pharyngolaryngeal lesions in otorhinolaryngology. Patients frequently present with existing TVE examinations before anesthesia. Although these patients are considered high risk, the diagnostic value of TVE for airway risk stratification is currently unknown. How can captured images or videos be used for anesthesia planning, and which lesions are most concerning? This study aimed to develop and validate a multivariable risk prediction model for difficult airway management based on TVE findings and to determine whether the discrimination of the Mallampati score can be improved by adding this new TVE model. METHODS: This retrospective single-center development and validation study assessed 4021 patients who underwent 4524 otorhinolaryngologic surgeries at the University Medical Centre Hamburg-Eppendorf between January 1, 2011, and April 30, 2018, with electronically stored TVE videos and included 1099 patients who underwent 1231 surgeries. TVE videos and anesthesia charts were systematically reviewed in a blinded fashion. The Least Absolute Shrinkage and Selection Operator (LASSO) regression analysis was used for variable selection, model development, and cross validation. RESULTS: The prevalence of difficult airway management was 24.7% (304/1231). Lesions at the vocal cords, epiglottis, or hypopharynx were not selected by the LASSO regression, while lesions at the vestibular folds (ß-coefficient 0.123), supraglottic region (ß-coefficient 0.161), arytenoids (ß-coefficient 0.063), and viewing restrictions on the rima glottidis that cover ≥50% of the glottis area (ß-coefficient 0.485) and pharyngeal secretion retention (ß-coefficient 0.372) were relevant risk factors for difficult airway management. The model was adjusted for sex, age, and body mass index. The area under the receiver operating characteristic curve (95% confidence interval) of the Mallampati score was 0.61 (0.57–0.65) and 0.74 (0.71–0.78) of the TVE model combined with Mallampati (P < .001). CONCLUSIONS: Stored images and videos from TVE examinations can be reused for the purpose of predicting risk associated with airway management. Vestibular fold, supraglottic, and arytenoid lesions are most concerning, especially if they are accompanied by secretion retention or restrict the glottic view. Our data indicate that the TVE model improves discrimination of the Mallampati score and might, therefore, be a useful addition to traditional bedside airway risk examinations.

• Meaning: Our data indicate that the TVE model might be a useful addition to traditional bedside airway risk examinations and identified vestibular fold, supraglottic, and arytenoid lesions as the most concerning findings, especially if they are accompanied by retention of secretions or restrict the glottic view. A irway management problems are one of the main reasons for anesthesia-related adverse events. [1][2][3] Transnasal flexible videoendoscopy (TVE) of the upper aerodigestive tract (also known as "nasendocopy") is a standard of care for the detection, classification, and staging of pharyngolaryngeal lesions in otorhinolaryngology. 4 Patients with suspected or known pharyngolaryngeal lesions, such as tumors, hyperplasia, edema, abscesses, or other space-consuming lesions, often require surgery or diagnostic procedures. Hence, patients frequently present with existing TVE examinations before general anesthesia.
Current guidelines recommend that airway evaluation may include preoperative bedside endoscopy. 5 However, although patients with pharyngolaryngeal lesions are at high risk for difficult airway management, 6-10 the diagnostic value of these pathological TVE findings for predicting risk associated with airway management is still unknown. An evidencebased score, assessment tool, or structured workflow is lacking. At present, it remains unclear how existing TVE examinations could be used for anesthesia planning and integrated into existing concepts for risk stratification, preinduction strategies, and decisionmaking, particularly for or against awake tracheal intubation. 8,10,11 It remains unclear which findings, tumor locations, spread, and size are most concerning and whether pathological TVE findings can improve discrimination if added to traditional bedside risk assessment tests. Captured images and videos from routine otorhinolaryngological TVE examinations could be shared with and reused by anesthesiologists for the purpose of predicting risk associated with airway management; consequently, quality and patients' safety could be improved without additional measures or expense. At present, neither the architecture of pharyngolaryngeal lesions nor TVE findings are represented in traditional bedside airway risk prediction scores. [12][13][14][15][16] It is not known whether systematic preoperative mapping of pharyngolaryngeal lesions and structured scoring of TVE findings improve prediction of risk associated with airway management.
This retrospective cohort study aimed to develop and validate a multivariable least absolute shrinkage and selection operator (LASSO) regression model for the prediction of difficult airway management based on TVE findings. A secondary aim was to determine whether discrimination of the Mallampati score 17 can be improved by adding this new TVE model.

METHODS
This multivariable prediction model development and validation study was conducted in accordance with the Declaration of Helsinki. The design and reporting are adapted from the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) statement. 18,19 The statistical analysis and reporting refer to the Statistical Analysis and Reporting Guidelines for CHEST. 20 The Ethics Committee of the Medical Association of Hamburg approved this retrospective study and waived the need to obtain written informed consent for collection, analysis, and publication of data (WF-022/18, April 16, 2018, Chairman/Professor Dr Stahl).

Patient Selection and Data Collection
We used an anonymized data set for our statistical analysis, which was collected for internal quality assessment. This anonymized data sample was originated from adult patients who underwent otorhinolaryngologic surgery at the University Medical Centre Hamburg-Eppendorf between January 1, 2011, and April 30, 2018. Only cases with preoperative TVE examinations not older than 30 days before surgery were selected. Recordings from repeated anesthetics and the corresponding TVE data and videos were only included in the analysis if patients had a new TVE examination not older than 30 days and still met all eligibility criteria. Anesthesia charts were systematically aNesthesia & aNalgesia Transnasal Videoendoscopy for Preoperative Airway Risk Stratification reviewed for the primary and secondary outcomes by 1 assessor (H.L.G.) and supervised by 2 anesthetists (A.B.S. and M.P.). Assessors were blinded to the TVE findings. Borderline findings were reviewed by all 3 assessors (H.L.G., A.B.S., and M.P.), discrepancies were discussed, and a consensus agreement regarding the question of whether the patient met the outcome definition was reached in each case. TVE videos were reviewed using a predefined registration sheet including a systematic mapping of lesions by 1 assessor (H.L.G.) supervised by 2 otorhinolaryngologists (R.S. and J.B.). Assessors were blinded to the results of the primary and secondary outcome assessments from the anesthesia charts (anonymized video analysis).
Unclear or borderline findings in the TVE videos were reviewed by all 3 assessors (H.L.G., R.S., and J.B.), discrepancies were discussed, and a consensus agreement was reached in each case. To determine interobserver reliability, 92 consecutive videos were independently analyzed by 3 assessors (H.L.G., R.S., and J.B.) within a structured protocol, blinded to each other's ratings.

Eligibility Criteria
The study flow and data selection process are given in Figure 1. Inclusion criteria were: age 18 years or older, otorhinolaryngologic surgery with general anesthesia, tracheal intubation facilitated either by direct laryngoscopy or videolaryngoscopy as first-line technique, and a TVE recording within 30 days before surgery.
Exclusion criteria were: mismatching, invalid or inconclusive operation and procedure classification system codes (OPS codes), incomplete or interrupted TVE examinations, invalid or incomplete video storage, inacceptable video quality, preexisting tracheostomy, a preexisting tracheal tube (eg, patients from an intensive care unit), as well as unavailable or incomplete anesthesia charts.

Outcome Measures
The primary and secondary outcome measures derived from the systematic chart review are given in Table 1.

Predictor Variables
Potentially eligible covariables (candidate predictors) for the primary outcome measure were identified by literature research, previous studies, and clinical considerations: 2. Viewing restrictions on the rima glottidis due to lesions (none/relevant viewing restriction that cover less than half of the glottis cross-sectional area/more than half of the glottis area) 3. Secretion retention (yes/no) 4. Sex (male/female) 5. Age (years) 6. Body mass index (BMI) (kg·m −2 ) The Mallampati score (modification of Samsoon and Young, I-IV 17 ) was used as a comparator.

Sample Size
We used the approach proposed by Riley et al 21 to verify the required sample size for model development after completion of data sampling using 3 criteria. First, the sample size should ensure an accurate estimate of the overall outcome risk. Based on literature review and our own in-house data, 6,7,9,22 we assumed a primary end point prevalence of 0.15 in patients undergoing otorhinolaryngologic surgery and a margin of error ≤0.05. Second, the sample size should lead to a shrinkage of predictor effects of 10%. Third, the sample size should lead to a small optimism in apparent model fit. To calculate the sample size, we assumed a C-statistic of 0.7 and 12 candidate predictors to be appropriate. The sample size for each criterion was calculated, and the maximum sample size was chosen. A required sample size of 981 patients was approximated.

Descriptive Statistics
Sample characteristics are given as absolute and relative frequencies or mean (standard deviation), whichever was appropriate. We used Fleiss' kappa to calculate the agreement between the 3 independent observers regarding the ratings of the TVE videos.
Statistical analysis was performed using SPSS statistics version 25 (IBM Inc) and R version 4.0.2 (R Foundation for Statistical Computing).

Multivariable Model Development, Validation, and Performance
For model development and internal validation, we performed LASSO regression using all 12 identified candidate predictors. For model development, zero points of metric valuables were shifted based on a data-driven approach (age) and on the World Health Organization classification for BMI. 23 In the first step, a 10-fold cross-validation was used to determine the shrinkage parameter λ. In the second step, based on the resulting largest λ with a cross-validated error within 1 standard error of the cross-validated error for the minimal lambda (λ 1se ), a LASSO regression-again with a 10-fold cross-validation-was calculated to estimate the shrunken β-coefficients. The second step was repeated 20 times. For each of these 200 validation cohorts, the area under the receiver operating characteristic curve (AUC) was calculated. We report the cross-validated mean AUC (cvAUC) with resulting standard deviation. The final coefficients result from the best-fitting model (highest AUC). Coefficients from the best-fitting LASSO regression model, which were not shrunk to zero, were considered predictors in the final multivariable LASSO regression model (TVE model).

Model Comparison
To determine whether the new TVE model improves discrimination when added to a traditional bedside difficult airway risk assessment test, we calculated the AUC with 95% confidence interval (95% CI) of the Mallampati score (modification of Samsoon and Young, I-IV 17 ) and of the Mallampati score combined with the TVE model. We used the nonparametric approach by DeLong et al 24 to compare the AUCs between models. For this analysis, we present P values without correction for multiplicity.

Score Development and Specification
For the development of a clinically applicable simplified TVE score, we weighted each predictor variable that was not shrunk to zero in the LASSO regression; for this purpose, β-coefficients were multiplied by 10. Metric variables were again multiplied by 10 to obtain score points per 10 years and per 10 kg m −2 and categorized. Values were rounded to provide integer score points. Based on our findings, age was categorized in risk groups (50-59, 60-69, and ≥70 years), and BMI was dichotomized (at least moderate obesity with BMI ≥35 kg m −2 [y/n] 23 ) after the development of the TVE model. To assess the discrimination of the simplified TVE score, we calculated the AUC (95% CI). To further assess the prediction accuracy of the simplified TVE score, we calculated the index of prediction accuracy (IPA). 25 We calculated the sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) for the final simplified TVE score. We used sensitivity and specificity from a utility-based perspective to determine the optimal decision thresholds. We claimed that a "screening cut-off value" should ensure a sensitivity above 60% and that an additional "diagnostic cut-off value" should ensure a specificity above 80%.

RESULTS
We identified 4021 patients who had undergone 4524 otorhinolaryngologic surgeries in general anesthesia at the University Medical Centre Hamburg-Eppendorf within the study period whose electronic patient records included a TVE video (Figure 1). We identified 1231 anesthetic cases in 1099 patients, who fulfilled all eligibility criteria. Baseline characteristics are given in Table 2.

Multivariable Model Development, Validation, and Performance
For the development of the multivariable LASSO regression model (TVE model), 12 candidate predictors were selected by clinical considerations, literature search, and previous studies. Lesions at the vocal cords, epiglottis, or hypopharynx were not selected by the LASSO regression (shrunk to zero), while lesions at the vestibular folds (ß-coefficient 0.123), supraglottic region (ß-coefficient 0.161), and arytenoids (ß-coefficient 0.063) were found to be relevant risk factors for difficult airway management in the LASSO regression (Table 4). Furthermore, pharyngeal secretion retention (ß-coefficient 0.372), restricted view on the rima glottidis due to lesions that cover more than half of the glottis cross-sectional area (ß-coefficient 0.485), age, male sex, and BMI were found to be relevant risk factors that were not shrunk to zero in the LASSO regression. The final TVE model encompasses 8 important covariables, while 4 predictors were shrunk to zero ( Table 4). The mean cvAUC of all 200 validation cohorts was 0.70 (standard deviation 0.05).

Simplified TVE Score
The final simplified TVE score has a range between 0 and 21 points (Figure 2). The AUC of the TVE score based on the complete training set is 0.71 (0.67-0.74). Values are mean ± SD or number (proportion), whichever is appropriate; the dataset of this analysis is complete (n = 1231) with the exception of the Mallampati score (50 missing values) and BMI (1 missing value). Abbreviations: ASA, American Society of Anesthesiologists; BMI, body mass index; SD, standard deviation. a Type of surgery was categorized: "endocrine glands" subsumes thyroid and parathyroid surgery, "ear" subsumes operations on the inner and middle ear, acoustic meatus and conch; "face and oral cavity" subsumes operations of the tongue, salivary glands or their excretory ducts, face, and oral cavity.    For model development, zero points of metric valuables were shifted based on a data-driven approach for age and based on the World Health Organization classification for BMI (underweight BMI <18.5 kg m −2 , overweight BMI 25-29.9 kg m −2 , obesity class I BMI 30-34.9 kg m −2 , class II BMI 35-39.9, class III BMI ≥40 kg m −2 ); for the TVE score, β-coefficients were multiplied by 10 and rounded to integer numbers; metric variables were categorized after model development.

aNesthesia & aNalgesia
Transnasal Videoendoscopy for Preoperative Airway Risk Stratification We calculated the IPA of the TVE score based on the complete training set that is 0.01%. 25 The optimal decision thresholds were determined from a utility-based perspective using a "screening cut-off value" (≥7 points) with sensitivity above 60% and a "diagnostic cut-off value" (≥9 points) with a specificity above 80%, resulting in a risk ranking: 7-8 points "increased risk" and ≥9 points "highest risk" (Supplemental Digital Content, Table S1, http://links.lww.com/AA/E231). Based on these thresholds, 19.2% of the cases in our cohort were classified as "increased risk" (difficult airway management prevalence 30.6%) and 20.9% as "highest risk" (prevalence 45.4%) for difficult airway management.
Fleiss' kappa analysis revealed moderate agreement between observers for the classification of "increased risk" (0.532) and substantial agreement for the classification of the "highest risk" (0.621) with the simplified TVE score. 26

DISCUSSION
Transnasal flexible endoscopy is a standard-of-care diagnostic measure for patients with hoarseness or with suspected laryngeal cancer. 4,27 Many of these patients require general anesthesia for subsequent surgery or diagnostics such as microsurgery thereafter. When we conducted the TVE study, we discussed how to use stored images or videos from TVE examinations for prediction of risk associated with airway management and anesthesia planning. From the patients' viewpoint, the benefit is obvious; physicians simply have to reuse existing data from previous examinations without exposing patients to any additional risk, inconvenience, or time-consuming additional diagnostic measures. Quality and patients' safety can be improved without causing additional expense, while health care resources can be preserved.
Patients with pharyngolaryngeal lesions are at high risk for difficult tracheal intubation 6-10 with an expected incidence up to 28%, 7 which is attributed to the size, location, bleeding tendency, risk of swelling, and impaired view during airway management. 6,7,28 Predictive tests, developed in cross-sectional surgical populations with a low average risk, are suspected to have poor discrimination in patients with laryngopharyngeal disease, as they are not tailored to these patients. 7,29 A widely accepted tool for classification of these findings (eg, in the electronic health record) has not yet been established in otolaryngology. [30][31][32] We developed and validated a multivariable LASSO regression model for the prediction of difficult airway management-the TVE model-which can be used by anesthesiologists or otorhinolaryngologists as a tool for preoperative risk stratification.
Male sex, moderate to severe obesity (BMI ≥35 kg m −2 ), and increased age were important risk factors for difficult airway management and were used for adjustment of the multivariable TVE model. Supraglottic, arytenoid, and vestibular fold lesions were the most concerning locations of pharyngolaryngeal lesions, while lesions of the vocal cords, epiglottis, and hypopharynx were not concerning. Further findings of concern were viewing restrictions on the glottis that cover more than the half of the glottis cross-sectional area (typically caused by space-consuming lesions) and pharyngeal secretion retention (eg, pooled secretion in the pyriform sinus) due to dysphagia.
The most recommended bedside difficult airway risk assessment tests are the upper lip bite test (ULBT) and Wilson score. 12,15 However, both scores rely solely on functional and anatomic assessments of the head and neck regions, such as mouth opening, mandibular jaw, and cervical spine mobility, while laryngopharyngeal lesions are ignored and thus present a fateful "blind spot" in current bedside risk assessment tests. Our findings suggest that TVE has a great potential to close this gap.
We found that the TVE model substantially improved discrimination when added to the Mallampati score. The simplified TVE score is intended to be embedded in existing risk scores and might be a useful addition to traditional bedside difficult airway risk assessment tests. 12,13,15 The notion of a preoperative risk prediction tool based on existing TVE examinations, stored images, or videos is novel. However, some previous studies have investigated the value of indirect transoral laryngoscopy 33 or TVE examinations, 11,34,35 ordered solely for the purpose of airway risk assessment. The risks, efforts, and expenses of this additional minimal invasive diagnostic measure must be weighed against its possible benefit. A previous study found that a visual inspection of airway pathology by TVE relevantly triggered the decision for or against awake tracheal intubation 11 ; however, it remains unclear if this changed the outcome. Which concrete objective findings should individual decision-making rely on? Guo et al 35 found that preoperative TVE, that orientated on the number of visible subglottic tracheal rings, had higher discrimination than the Mallampati or Wilson scores to predict difficult intubation. However, individuals with laryngeal lesions were excluded.
Gemma et al 34 used TVE for an anatomic-functional assessment of the laryngopharyngeal region, similar to the modified Cormack-Lehane classification, in 169 patients undergoing ear, nose, or throat (ENT) surgery. Individuals with pharyngolaryngeal neoplasm or previous radiation therapy were excluded. They found an improved prediction of their "endoscore" in addition to a routine bedside evaluation to predict difficult intubation.
Although data regarding preoperative TVE are limited, the current 2022 American Society of Anesthesiologists (ASA) guidelines recommended that airway evaluation may include preoperative bedside endoscopy. 5 The 2021 Canadian Airway Focus Group recommended that nasal endoscopy immediately before airway management in patients with known or suspected obstructing glottic or supraglottic airway pathology may be helpful. 36 Our findings clearly underline these recommendations and provide additional insight into how to interpret findings and to implement a structured assessment and diagnostic workflow.
Although videolaryngoscopy has revolutionized airway management, 37 TVE has been studied almost exclusively in conjunction with conventional, direct laryngoscopy, while data in patients undergoing

aNesthesia & aNalgesia
Transnasal Videoendoscopy for Preoperative Airway Risk Stratification videolaryngoscopy are very limited. 38 As improvement of the laryngeal view during intubation is the most obvious domain of videolaryngoscopy, the question arises as to which role is played by TVE if videolaryngoscopy is globally available at the bedside. How informative and predictive is preoperative TVE in conjunction with subsequent videolaryngoscopy? In our study, videolaryngoscopy was used as a first-line technique in 110 patients and further as a rescue technique in 58 cases; however, further studies are needed to determine the diagnostic value of TVE in the era of videolaryngoscopy.
Our data only reflect single-center experiences, and caution should be exercised should our results be generalized. In particular, tracheal intubation and TVE techniques may differ between institutions. Further external validation of the TVE model could reinforce our findings. As it is well-known that converting a regression model to an integer score to improve clinical applicability discards useful information and alters calibration, the unsimplified coefficient model should be used for risk prediction whenever possible. 39 It should be considered that our study cohort comprises patients in whom TVE was indicated for otolaryngological reasons; hence, the prevalence of difficult airway management in our cohort was high, and even individuals with a low TVE score were at higher risk than an average person. Although TVE images and videos were not routinely available for the anesthetists in the study period, the retrospective nature of this study is a well-recognized limitation and a potential source for selection and diagnostic review bias. 40

CONCLUSIONS
Pharyngolaryngeal lesions still represent a faithful "blind spot" in traditional bedside airway risk assessment tests. Captured images and videos from existing TVE examinations can be reused by anesthesiologists or otorhinolaryngologists to stratify the risks associated with airway management. While supraglottic, arytenoid, and vestibular fold lesions were concerning, vocal cord, epiglottis, and hypopharynx lesions were not. Relevant viewing restrictions on the glottis and pharyngeal secretion retention were associated with difficult airway management. We developed and validated a novel TVE model that substantially improved discrimination when added to the Mallampati score, and is a useful addition to traditional bedside airway risk examinations. E