Derivation and Validation of a Simple Perioperative Sleep Apnea Prediction Score : Anesthesia & Analgesia

Secondary Logo

Journal Logo

Ambulatory Anesthesiology: Research Reports

Derivation and Validation of a Simple Perioperative Sleep Apnea Prediction Score

Ramachandran, Satya Krishna MD, FRCA*; Kheterpal, Sachin MD, MBA*; Consens, Flavia MD; Shanks, Amy MS*; Doherty, Tara M. DO*; Morris, Michelle MS*; Tremper, Kevin K. PhD, MD*

Editor(s): Glass, Peter S. A.

Author Information
Anesthesia & Analgesia 110(4):p 1007-1015, April 2010. | DOI: 10.1213/ANE.0b013e3181d489b0
  • Free
  • Chinese Language Editions


Obstructive sleep apnea (OSA) is a prevalent condition in 9% to 24% of the general population1 that occurs as a result of partial or complete airway obstruction during sleep and is associated with episodic hypoxemia. Both anesthesia and surgery affect sleep patterns, resulting in apnea or desaturation even in patients without presumed OSA,2 but OSA increases this risk significantly.3 OSA increases the risk of cardiac arrhythmias,4,5 myocardial infarction,4 stroke,6 and sudden death7 in the general population. An important step in reducing morbidity in this patient population is identifying those with OSA preoperatively.8 The diagnosis of OSA is usually based on a sleep study, but it is impossible to envisage the routine use of overnight polysomnography (OPS) as a perioperative screening test because of cost and resource constraints. Recent American Society of Anesthesiologists' (ASA) practice guidelines8 stress the need to identify OSA in the perioperative period through history, physical assessment, and laboratory tests.

There are 2 important considerations regarding perioperative screening for OSA. First, there are currently numerous prediction models, some of which are highly accurate in predicting OSA. The most successful models typically use either a complex risk derivation formula based on multiple variables9 or combine such formulae with additional measurements and investigations such as morphometry10 and cephalometry.11 The complexity of these prediction models reduces their utility in the immediate perioperative period. However, simplicity in terms of test design comes at the cost of accuracy, and the simplest model, the STOP (snoring, tiredness during daytime, observed apnea, and high blood pressure) questionnaire12 has a sensitivity of 0.65 at an apnea-hypopnea index (AHI) threshold of 5. The STOP-BANG (BMI, age, neck circumference, and gender) model, which was derived from the validation set of the STOP study,12 was shown to have excellent value as a perioperative screening test in the presence of severe OSA, but is yet to be validated prospectively. Second, the reliability of a diagnostic test is best when derived in a study population that closely resembles clinical practice, because distinguishing features of a disease are different in a high-risk population.13 The diagnostic accuracy of a test can be biased or overestimated if a test is derived in a group of patients with underlying high prevalence of a disease, rather than in a typical clinical population.13 The prevalence of OSA as defined by AHI ≥5 in the STOP study was 73% in the derivation group and 69% in the validation group,12,14 similar to most clinical screening test studies for OSA that are based on sleep laboratories.911 In this context, we sought to identify independent clinical predictors of a perioperative diagnosis of OSA in a broad spectrum university hospital surgical population, using common preanesthetic evaluation methods, and develop a perioperative OSA prediction model based on these variables.


After obtaining IRB approval (University of Michigan, Ann Arbor, MI), we performed the study in 2 steps. The first step involved deriving the screening test from a broad spectrum of surgical patients: the general surgical population, or GSP group. The second step involved validating the screening test in a set of patients undergoing overnight sleep study, the overnight polysomnography or OPS group.

Derivation of Prediction Score in the GSP Group

All adult patients undergoing general anesthesia during a 40-month period were analyzed in this study. Individual patient informed consent was waived by the IRB because no clinical interventions were studied, and no patient-identifiable data were used. Exclusion criterion was patients younger than 18 years. The primary outcome measure was the perioperative diagnosis of OSA. This was defined as OSA diagnosed with OPS and treated with continuous positive airway pressure (CPAP), bilevel positive airway pressure, or surgery for OSA. Patients who satisfied these criteria were grouped together as GSP-OSA. The remaining patients were grouped together and called GSP-controls.

Perioperative, intraoperative, and postoperative data were collected from routine clinical documentation entered by anesthesiology residents, attending staff, and certified registered nurse anesthetists into the institution's perioperative clinical information system (Centricity®, General Electric Healthcare, Waukesha, WI). The data analysis was performed retrospectively. The clinical evaluation form and its data storage were designed not only to serve clinical purposes but also to collect data for observational research studies. Each clinical element (body mass index [BMI], snoring, etc.) is stored as a discrete database element. In addition, a structured, predefined pick list is used by the clinician to enter information (Appendix). The demographic and history variables were chosen from the anesthesia assessment dataset after a thorough literature search for associations with OSA from among frequently used perioperative assessment tools. For each of the patients included in the study, data on the following variables were collected: patient or family member report of snoring,12 patient report of treated or untreated hypertension,12 patient report of treated Type 2 diabetes mellitus,15 and calculated BMI12 from patient report of weight and height, age,12 and gender.12 Airway variables frequently assessed preoperatively, namely, modified Mallampati class,16 qualitatively assessed thick neck,12 reduced thyromental (TM) distance17 estimated <6 cm, reduced mouth opening estimated <4 cm, mandibular protrusion test assessed as inability to prognath lower incisors anterior to upper incisors, and clinically estimated reduced cervical spine mobility, were additionally included as variables because several of these have previously been associated with difficult airway.18 These airway variables were assessed at the discretion of the individual caregivers. Data on CPAP treatment were not specifically collected because both duration and regularity of usage were poorly documented.

Statistical Analysis

Statistical analysis was performed using SPSS version 15 (SPSS, Chicago, IL). Colinearity diagnostics were performed on all the variables as well as bivariate correlation matrix to evaluate pairwise correlations and address any groups with a pairwise correlation >0.70. Continuous variables were transformed into dichotomous variables by identifying the maximal sum of specificity and sensitivity using a receiver operating characteristics (ROC) curve. Variables were then entered into a logistic regression full model fit. All significant variables (P < 0.05) were deemed significant independent predictors of OSA. A hazard ratio was calculated for each significant predictor comparing the likelihood of OSA with and without the risk factor. This model was evaluated using the area under the ROC curve.

An unweighted clinical scale was produced assigning 1 point per independent predictor. A weighted scale was also derived based on the β coefficients derived from the logistic regression full model fit. The predictive accuracy of the weighted and unweighted scales was separately assessed using the area under the ROC curve. The scale that best combined ease of use with accuracy was called the perioperative sleep apnea prediction (P-SAP) score. To describe the effect of incremental risk factor presence on the predictability of OSA, the sensitivity, specificity, positive predictive value, negative predictive value, positive and negative likelihood ratios, and diagnostic odds ratios were then calculated for each threshold score for P-SAP score.

Validation of the P-SAP Score in the OPS Group

The P-SAP score was then validated in a series of patients who underwent OPS at our institution (OPS group). These patients initially presented to the sleep laboratory for symptoms and features highly suggestive of OSA. These patients underwent surgery within a 6-month period of the sleep study during which time the elements of the P-SAP score were assessed as part of their perioperative assessment. The patients were identified by combining the anesthesia clinical information system database and the sleep laboratory database at the University Hospital. P-SAP relevant data were entered prospectively into the electronic anesthesia record (Centricity®, General Electric Healthcare) by anesthesia caregivers blinded to the results of polysomnography. Subjects were studied with standard polysomnography techniques in the University of Michigan sleep laboratory for at least 7 hours. Four electroencephalographic channels (C3, C4, O1, and O2 by the international 10–20 system), 3 chin electromyogram leads, 2 electrooculogram leads, 2 electrocardiogram leads, snoring sound, respiratory effort using piezoelectric belts over the chest and abdomen, airflow at the nose and mouth using thermocouples and nasal pressure cannulas, and 2 bilateral surface electromyogram electrodes (placed over the anterior tibialis muscles) were recorded. Oxyhemoglobin saturation (SpO2) was monitored by pulse oximetry.

Experienced polysomnography technologists, blinded to the P-SAP score, used standard techniques to manually score all recordings for sleep stages, respiratory events, and limb movements. Polysomnography measures followed the rules of Rechtschaffen and Kales19 for sleep staging and standard recommendations for respiratory scoring. Patients with AHI ≥5 per hour were diagnosed as having OSA (OPS-OSA group). Patients with AHI <5 per hour were grouped as OPS-controls. Diagnostic thresholds of AHI were chosen as 5 to 14.9 events per hour, 15 to 29.9 events per hour, and ≥30 per hour for diagnosis of mild, moderate, and severe OSA, respectively, as described in previous studies.12,14 Using these AHI thresholds, the P-SAP score was validated at ≥2 and ≥6 to provide the following summary measures of accuracy: sensitivity, specificity, positive and negative predictive values, positive and negative likelihood ratios, and diagnostic odds ratios.

Finally, comparisons were made between GSP-controls and OPS-controls on the one hand and GSP-OSA and OPS-OSA on the other, to identify whether generalizations could be made to the study subgroup populations. Significant differences were calculated using Pearson χ2 for categorical variables and t test for continuous variables, with statistical significance set at P < 0.05.


GSP Group

During the study period (May 2004 to September 2007), 43,576 cases were noted to have complete data entry for variables of interest to the study. Of these, 3884 patients (7.17%) had OSA diagnosed with OPS and were treated with CPAP, bilevel positive airway pressure, or surgery. Univariate analysis showed a statistically higher prevalence of the following variables in the OSA group: male gender, snoring, thick neck, modified Mallampati class 3 or 4, limited mouth opening, limited jaw protrusion, limited c-spine mobility, reduced TM distance, Type 2 diabetes mellitus, hypertension, BMI >30 kg/m2, and age >43 years (P < 0.05, Table 1). Colinearity diagnostics did not reveal a condition index higher than 12.6. The maximal bivariate correlation matrix was 0.419; therefore, no variables were removed from the model. Age and BMI were converted into categorical variables using a ROC curve and demonstrated the optimal balance of sensitivity and specificity at 43 years for age and 30 kg/m2 for BMI.

Table 1:
Preoperative Patient Characteristics

A logistic regression full model fit was performed on 43,576 valid cases and revealed 9 independent perioperative predictors (P < 0.05): male gender, history of snoring, thick neck, modified Mallampati class 3 or 4, TM distance <6 cm, hypertension, Type 2 diabetes mellitus, BMI >30 kg/m2, and age >43 years (Table 2). The model was evaluated using the omnibus tests for coefficients, which revealed a χ2 of 4173.734, with 12 degrees of freedom, and a P value <0.001. The area under the ROC curve for the unweighted model was 0.79 ± 0.004, and 0.82 ± 0.004 for the weighted model. Hazard ratios for each independent risk factor were also developed (Fig. 1).

Table 2:
Independent Predictors of Obstructive Sleep Apnea
Figure 1:
Hazard ratio of independent perioperative predictors of obstructive sleep apnea. These 9 independent perioperative predictors of perioperative diagnosis of obstructive sleep apnea were identified using a full-model fit logistic regression model. A hazard ratio (±95% confidence interval) for each risk factor was derived by comparing the odds of having a diagnosis of obstructive sleep apnea in patients with and without the given risk factor.

These predictors were then used to create a clinical scale. Each patient was assigned 1 point for each of the 9 risk factors they possessed. The characteristics of the clinical scale are shown in Table 3. Weighting the model based on a previously described formula for assigning weighted points20 resulted in a highly complex formula (69 points) with minimal benefit (0.03 improvement in area under the ROC curve) gained in terms of better accuracy (Table 2). In the interest of simplicity, the unweighted scale was thus accepted as the P-SAP score, and the summary measures of accuracy were derived for the unweighted score. Choosing a diagnostic threshold to ≥2 risk factors produced a test sensitivity of 0.939 (false-negative [FN] rate 0.06) and specificity of 0.323, with positive and negative predictive values of 0.1 and 0.99, respectively. Increasing the P-SAP threshold to ≥6 increased the specificity to 0.974, at the expense of sensitivity (0.239), with positive and negative predictive values of 0.33 and 0.97, respectively. Maximal combined sensitivity (0.667) and specificity (0.773) with positive and negative predictive values of 0.19 and 0.97, respectively, were observed at a P-SAP score threshold of ≥4.

Table 3:
Summary Characteristics of the P-SAP Score—GSP (Derivation) Cohort

OPS Group

The demographic features of patients in the OPS cohort are described in Table 1. The prevalence of OSA in this population was 75.7%, similar to validation study prevalence of OSA in other studies.

There is a close similarity between the frequency of studied variables of the patients with diagnosis of OSA or treatment for OSA in the GSP-OSA and OPS-OSA groups. When comparing the GSP-control group with the OPS-control group, significantly more men were seen in the GSP-control group, whereas the frequency of other independent variables were all less common in the GSP-control group with significant difference seen for snoring, thick neck, reduced TM distance, BMI ≥30 kg/m2, and age ≥43 years. Thus, GSP-control and OPS-control were not similar in variables of interest for this study.

The summary measures of accuracy for the OPS set are presented in Table 4. For AHI ≥5 events/h, sensitivity of threshold P-SAP score ≥2 was 0.946, sensitivity 0.258, positive predictive value 0.799, negative predictive value 0.604, and diagnostic odds ratio 7.361. Increasing the threshold P-SAP score to ≥6 resulted in an increase in specificity to 0.911, with an attendant decrease in sensitivity to 0.217. At the P-SAP threshold of ≥4, the specificity and sensitivity were comparable with the GSP group values, with sensitivity 0.635 (0.577–0.656), specificity 0.653 (0.577–0.723), positive predictive value 0.859 (0.829–0.888), negative predictive value 0.349 (0.309–0.387), and diagnostic odds ratio of 3.281.

Table 4:
Summary Measures of Accuracy of Validated P-SAP Score in OPS Group

Finally, the prevalence of the validated risk factors in the GSP and OPS populations was analyzed. There is a significant difference in the frequency of risk factors in the OPS population compared with the GSP population at P-SAP score ≥2 (89.4% vs 69.1%), P-SAP score ≥4 (54.0% vs 26.2%), and P-SAP score ≥6 (18.4% vs 5.5%), with all comparisons achieving P < 0.05 on Pearson χ2 analysis.


Our data demonstrate a frequency of perioperative diagnosis of OSA of 7.17% in the surgical population studied. In addition, we have derived a clinical prediction tool, the P-SAP score in a typical surgical cohort, and validated the P-SAP score in a subgroup of patients who presented for surgery within 6 months of having a formal sleep study. The P-SAP score provides a screening method that is easy to perform because it incorporates routine preanesthetic assessment variables (including measures of upper airway morphology). In this surgical population, the P-SAP score of ≥2 has high sensitivity at the expense of specificity, whereas a score of ≥6 has high specificity at the expense of sensitivity.

Recent ASA guidelines stress the importance of perioperative diagnosis and management of patients with OSA.8 The “gold standard” of diagnosis of OSA and severity grading has remained the overnight sleep study, but because of constraints of time, personnel, and cost, it cannot be considered as a primary screening mechanism for OSA. Common wait times for polysomnography can vary from 2 to 10 months in the United States.21 Previously, a variety of clinical prediction models and algorithms have used several of the P-SAP variables to aid risk assessment and screening before polysomnography.9,10, 12,14 The full description and critique of the various models in use are beyond the scope of this discussion, but the key considerations in assessing a screening test for OSA are discussed below. First, a common problem with all previously described OSA screening tests, including the STOP questionnaire, is spectrum bias or high underlying prevalence of OSA in the derivation cohort.22 This is a very different clinical scenario compared with the GSP, where the prevalence of OSA is much lower and the clinical distinction between normal patients and those with OSA is possibly more apparent. In essence, these can be considered to be 2 completely different study populations. Prediction models that are derived in high prevalence populations report higher sensitivity than is seen when the test is used in a lower risk population. The advantage of deriving the screening tests in a representative clinical population is that this is exactly how the tests will be used in practice.13 Second, there is a tradeoff between sensitivity and specificity with most clinical models for OSA. The most clinically important summary statistic is possibly the FN rate (defined as 1 − sensitivity), which gives the proportion of patients with OSA who were screened as normal by the diagnostic test. Because sensitivity is considered prevalence independent, it follows that FN rates are also independent of prevalence of OSA. The FN rate is typically expressed as a conditional probability or a percentage of patients with OSA that are missed by the screening test. The Berlin questionnaire, which is now commonly used in several hospitals, has FN rates of 14.5% to 38.2%,14 clearly making it undependable to robustly exclude OSA preoperatively. Similar FN rates were observed with the ASA model,14 STOP questionnaire,12and STOP-BANG model12 for AHI ≥5 (37.9% vs 34.4% vs 16.4%, respectively) and AHI ≥30 (12.3% vs 20.5% vs 0.0%, respectively). The best screening tool of the 3 above models, the STOP-BANG model, was derived as a post hoc estimate and is yet to be validated in a truly representative surgical population. Therefore, it is of interest that 6 of the 8 elements of the STOP-BANG model have been shown to be independent predictors of OSA in the P-SAP derivation study, lending validity to their clinical importance. The P-SAP score has been shown to have a reproducibly higher sensitivity (with FN rates <10% across all AHI validation thresholds) compared with the STOP questionnaire and the ASA model. As with all clinical screening tests, the effect of false positives on resource utilization and FNs on adverse outcome are important considerations. Perhaps one critical issue is the fact that we still do not know whether there is a subgroup of patients with OSA at high risk of postoperative complications. Further studies into identifying the subgroup at greatest risk of postoperative sleep apnea and hypoxemia are imperative at this point.

One limitation of the study is that it focuses purely on the ability to screen robustly for a diagnosis of OSA and not on the prediction of postoperative morbidity. The incidence of postoperative morbid events was not analyzed as part of this study. There are several reasons for the lack of postoperative data: first, postoperative monitoring data were not routinely collected in an electronic format, reducing the usefulness of retrospective data gathering. Furthermore, none of the institutional electronic datasets had complete data on this particular cohort of patients for analysis of outcomes in this study. Second, previous studies have demonstrated that postoperative central and obstructive apneas occur irrespective of severity of OSA. Indeed, Chung et al.12,14 showed that patients with OSA had a significantly higher incidence of postoperative complications compared with controls (22.6% vs 12.3%), primarily related to desaturation (20.6% vs 9.2%) and prolonged oxygen therapy (14.3% vs 4.7%). Finally, Kaw et al.23 estimated that a prospective study of >2000 patients would be required to demonstrate a doubling of complications related to purely OSA in coronary artery bypass graft patients. Recent research into the effects of remifentanil in patients with OSA suggests that the at-risk population may in fact not have the typical markers of severe disease.24 Until this population is identified, it is important for anesthesiologists to exclude OSA as efficiently as possible preoperatively. Hence, this study focuses purely on the ability to screen robustly for a diagnosis of OSA.

If we identify patients with OSA before they have surgery, the number of patients potentially receiving additional monitoring care in the postanesthesia care unit for signs of desaturation will no doubt increase and discharge times from the postanesthesia care unit could be delayed further if desaturation episodes do indeed occur, but these additional costs will offer more safety as per ASA practice guidelines.8 The P-SAP score with its ease of use (linear scale, no need for additional investigations) and high sensitivity across disease severity may offer a useful method of screening. At a threshold P-SAP score ≥2, it is possible that many patients who may not be at high postoperative risk may be included. This lack of specificity at the P-SAP score ≥2 is a drawback, and future research into identifying techniques or measurements to increase the specificity of screening at the P-SAP score ≥2 is essential to make it a cost-effective addition to the anesthesiologist's perioperative assessment. However, using a threshold P-SAP score ≥4 ensures a significant improvement in specificity (0.773), with a decrease in sensitivity (0.667), which is comparable with other screening tools currently available.12,14,22

Our retrospective study, similar to others based on large prospectively collected clinical databases,18 has certain generic limitations. Despite general standardization of perioperative evaluation at our institution, we cannot guarantee that controlled and uniform conditions were applied across all the assessments. Furthermore, although all the perioperative variables have excellent data entry rates, both the possible predictors and outcomes were recorded by providers as part of their clinical documentation responsibilities. As a result, the data reflect the electronic medical record, and no additional detail is available. There were no rigorous processes to validate the entry of data. Although the format and specificity of some elements were prospectively altered to provide more detailed data for analysis, we did not use a distinct data collection form with diagrams and extensive definitions to assist providers in accurate selection as recommended in other studies.25 Therefore, no specific measurements were undertaken of neck circumference, TM distance, mouth opening, and c-spine mobility. Instead, the judgment of the clinician was used for these variables. However, there is no reason to believe that one particular group suffered exclusively as a result of observation bias through this large dataset. Additionally, the large numbers of patients included in the study help reduce the effect of the observation error. Finally, the size of our analysis precluded performing screening polysomnography to confirm or reject the diagnosis of OSA for the GSP group. In the GSP group, we accepted a diagnosis of OSA, whether or not it was diagnosed in our institution, as long as the diagnosis was supported by an explicit treatment plan such as CPAP therapy or surgery. It is wholly possible that the patients in the GSP-OSA group had greater severity of the disease because they were treated with surgery or CPAP, and this may or may not have had an effect on the clinical predictors studied. Because of the retrospective nature and large number of studies in the validation (OPS) group, no formal assessment was made of the degree of agreement between sleep laboratory technicians and physicians for the diagnosis of OSA. Polysomnography is routinely scored by a core group of technicians at the university hospital. The standard methodology of scoring was used. The scoring of sleep studies is subject to a robust quality-assurance process to maintain high quality of reporting. We did not present the other variables of the sleep study, because they did not affect the variables studied directly, unlike the STOP study that analyzed measures of tiredness and sleepiness. Finally, patients in the validation study were investigated for sleep-related disorders and had incidental surgery within 6 months of the sleep study. Although this in itself does not influence the validity of the sleep study, it is important to note that the prevalence of OSA in the validation (OPS) group was similar to the STOP study.

Another possible criticism of our approach for the derivation sample is the influence of undiagnosed OSA on the GSP-control population variables. This critique is based on the often-quoted statistic that 90% of OSA is undiagnosed.1 The prevalence of OSA in specific surgical populations is thought to be high: 71% to 95% in patients who are morbidly obese undergoing bariatric surgery26 and 23% in patients with traumatic brain injury.27 The typical university population in this study consists mainly of nonobese (30%–35% obesity prevalence) and female patients (45%–50% male prevalence), and it is unlikely that the quoted prevalence statistics for high-risk groups apply across the entire GSP. Previous population studies identify a 2% to 9% prevalence of OSA in middle aged women and 4% to 24% prevalence in middle aged men.1

Therefore, it seems reasonable to assume that a typical university hospital surgical population has an OSA prevalence that ranges between these, 2% and 24%. The prevalence of moderate-severe OSA in the study by Young et al.1 was 9% in men and 4% in women, and this typically represents the proportion of patients who need treatment for OSA, either surgical or CPAP. The prevalence of diagnosis of treated OSA in our GSP cohort was 7.17%, similar to the prevalence of moderate-severe OSA in the study by Young et al.

Despite these differences, the frequency of independent predictors was comparable in the GSP-OSA group and OPS-OSA group. In other words, patients with OSA across the GSP and OPS groups had similar distribution of study variables. However, the GSP-control group had a lower frequency of the independent predictors than the OPS-control group, therefore supporting the view that the majority of patients in the GSP-control group did not have markers of OSA. Otherwise, there would have been a tendency to higher frequency of these variables in the GSP-controls. As further support of the validity of the P-SAP elements, all the variables on the P-SAP score have previously been identified as independent or univariate predictors of OSA in various populations. The nature of logistic regression full-fit analysis removes dependent variables from the prediction model. Thus, the 9 variables that form the P-SAP score were independent of each other in this particular cohort of patients. The central mechanism of these various manifestations of OSA relates to abnormal fat deposition in the upper airway. The occurrence of obesity and other markers of metabolic syndrome in association with OSA underline the importance of the involved metabolic pathways. Thus, many of these variables are ultimately caused by a single but complex process and are interrelated but not necessarily dependent on each other.

One further criticism of our study could be that other validated clinical predictors of OSA such as daytime somnolence, observed apnea, and tiredness were not included in the P-SAP score. At the time of this study, these variables were not part of a typical perioperative assessment at our university hospital and completed data on these elements were sparse and consequently not analyzed. These variables are not universally used in clinical screening tests for OSA as shown in a recent meta-analysis.22 The STOP-BANG model12 relies partly on daytime somnolence and observed apneas, both of which have advantages and disadvantages. Observed apneas are highly specific of OSA but require the presence of a nighttime observer to identify. Daytime somnolence is seen in the general population primarily due to depression and obesity,28 possibly affecting its accuracy as a predictor of OSA.

Despite these limitations, this study is notable for a few important reasons. First, at a diagnostic threshold P-SAP score ≥2, the reproducibly high sensitivity across all severity of OSA is a useful alternative to other perioperative clinical screening tests for OSA. Further work is essential to identify methods or additional tests to improve the specificity of this threshold score because it is currently seen in a vast proportion of patients presenting for surgery (69.1%). Second, unlike prediction models with higher specificity for OSA, such as morphometry and cephalometry, the P-SAP score does not incur an additional burden of data collection and appraisal, i.e., the data elements are all part of a standard perioperative evaluation. Finally, an additional consideration for anesthesiologists is the role as perioperative physicians in the long-term medical management of patients screened to be at high risk of OSA. Using the information from this study, we recommend that those patients who screen positive for OSA with P-SAP scores ≥6 should undergo expedited polysomnography because this score is associated with an extremely high specificity. These patients are likely to benefit from CPAP therapy. This approach could facilitate long-term management of these patients and ensure that perioperative referrals for sleep studies have an appropriately low false-positive rate. Although this may not be feasible preoperatively in all patients, clinical processes to facilitate postoperative testing after an appropriate postoperative recovery period should be considered. Further work needs to be done to ascertain the effect of such therapy on morbidity and mortality after surgery in these high-risk patients. Admittedly, this approach will miss a significant proportion of patients with OSA (because of the sensitivity of 0.239), but the purpose at this level is to ensure that a proportion of patients at high risk of long-term complications related to OSA are identified and appropriate secondary preventative treatment is offered as indicated by sleep study results. It should be noted that the purpose of this study was to identify clinical risk factors for OSA and not the outcomes. As such, recommendations for clinical processes need rigorous cost-effectiveness analyses to ensure that the optimal balance is achieved between missed cases and false positives.

The P-SAP score validates 6 of the 8 elements of the STOP-BANG model12 but differs from it in 2 important ways. First, the P-SAP score uses upper airway elements such as high modified Mallampati class and reduced TM distance. Modified Mallampati class has been validated previously as a marker of diagnosis and severity of OSA.16 Reduced mandibular length is an important bony factor associated with OSA in nonobese patients17 and, therefore, was included to provide an additional screening measure for a wider spectrum of patients. Second, the inclusion of Type 2 diabetes as an element in the P-SAP score is important because diabetes has been linked not only to diagnosis of OSA but also to severity of the disease.15 Perhaps as a result of these additional elements, the validated FN rate for the P-SAP score of ≥2 was <10% across all severities of OSA (<2% FN rate for severe OSA is similar to the STOP-BANG), which is a significant improvement on the accuracy of previously reported clinical models including the STOP-BANG model. Thus, both the P-SAP score and the STOP-BANG model use several similar elements but have important differences that may have significant impact of performance across different patient spectrums, and this should ideally be answered by a prospective head-to-head analysis.

In summary, we report a new P-SAP score for OSA with excellent test characteristics. In contrast to current prediction models for OSA, this model was developed for use in a GSP with low prevalence of diagnosis of OSA. The P-SAP score uses commonly collected perioperative variables, thereby allowing its seamless addition to the current anesthesia evaluation, with reproducible accuracy across the entire spectrum of OSA severity. This screening tool could help with perioperative risk stratification, allocation of postoperative monitoring resources, and identification of people who might benefit from formal polysomnography and treatment.



1. Young T, Palta M, Dempsey J, Skatrud J, Weber S, Badr S. The occurrence of sleep-disordered breathing among middle-aged adults. N Engl J Med 1993;328:1230–5
2. Ahmad S, Nagle A, McCarthy RJ, Fitzgerald PC, Sullivan JT, Prystowsky J. Postoperative hypoxemia in morbidly obese patients with and without obstructive sleep apnea undergoing laparoscopic bariatric surgery. Anesth Analg 2008;107: 138–43
3. Blake DW, Chia PH, Donnan G, Williams DL. Preoperative assessment for obstructive sleep apnoea and the prediction of postoperative respiratory obstruction and hypoxaemia. Anaesth Intensive Care 2008;36:379–84
4. Hung J, Whitford EG, Parsons RW, Hillman DR. Association of sleep apnoea with myocardial infarction in men. Lancet 1990;336:261–4
5. Shepard JW Jr, Garrison MW, Grither DA, Dolan GF. Relationship of ventricular ectopy to oxyhemoglobin desaturation in patients with obstructive sleep apnea. Chest 1985;88:335–40
6. Arzt M, Young T, Finn L, Skatrud JB, Bradley TD. Association of sleep-disordered breathing and the occurrence of stroke. Am J Respir Crit Care Med 2005;172:1447–51
7. Gami AS, Howard DE, Olson EJ, Somers VK. day-night pattern of sudden death in obstructive sleep apnea. N Engl J Med 2005;352:1206–14
8. Gross JB, Bachenberg KL, Benumof JL, Caplan RA, Connis RT, Cote CJ, Nickinovich DG, Prachand V, Ward DS, Weaver EM, Ydens L, Yu S. Practice guidelines for the perioperative management of patients with obstructive sleep apnea: a report by the American Society of Anesthesiologists Task Force on Perioperative Management of patients with obstructive sleep apnea. Anesthesiology 2006;104:1081–93
9. Kirby SD, Engl P, Danter W, George CF, Francovic T, Ruby RR, Ferguson KA. Neural network prediction of obstructive sleep apnea from clinical criteria. Chest 1999;116:409–15
10. Kushida CA, Efron B, Guilleminault C. A predictive morphometric model for the obstructive sleep apnea syndrome. Ann Intern Med 1997;127:581–7
11. Battagel JM, L'Estrange PR. The cephalometric morphology of patients with obstructive sleep apnoea (OSA). Eur J Orthod 1996;18:557–69
12. Chung F, Yegneswaran B, Liao P, Chung SA, Vairavanathan S, Islam S, Khajehdehi A, Shapiro CM. STOP questionnaire: a tool to screen patients for obstructive sleep apnea. Anesthesiology 2008;108:812–21
13. Whiting P, Rutjes AW, Reitsma JB, Glas AS, Bossuyt PM, Kleijnen J. Sources of variation and bias in studies of diagnostic accuracy: a systematic review. Ann Intern Med 2004;140: 189–202
14. Chung F, Yegneswaran B, Liao P, Chung SA, Vairavanathan S, Islam S, Khajehdehi A, Shapiro CM. Validation of the Berlin questionnaire and American Society of Anesthesiologists checklist as screening tools for obstructive sleep apnea in surgical patients. Anesthesiology 2008;108:822–30
15. Reichmuth KJ, Austin D, Skatrud JB, Young T. Association of sleep apnea and type II diabetes: a population-based study. Am J Respir Crit Care Med 2005;172:1590–5
16. Liistro G, Rombaux P, Belge C, Dury M, Aubert G, Rodenstein DO. High Mallampati score and nasal obstruction are associated risk factors for obstructive sleep apnoea. Eur Respir J 2003;21:248–52
17. Sakakibara H, Tong M, Matsushita K, Hirata M, Konishi Y, Suetsugu S. Cephalometric abnormalities in non-obese and obese patients with obstructive sleep apnoea. Eur Respir J 1999;13:403–10
18. Kheterpal S, Han R, Tremper KK, Shanks A, Tait AR, O'Reilly M, Ludwig TA. Incidence and predictors of difficult and impossible mask ventilation. Anesthesiology 2006; 105:885–91
19. Rechtschaffen A, Kales A. A Manual of Standardized Terminology, Techniques and Scoring System for Sleep Stages of Human Subjects. Los Angeles, CA: Brain Information Services, University of California, 1968
20. Rassi A Jr, Rassi A, Little WC, Xavier SS, Rassi SG, Rassi AG, Rassi GG, Hasslocher-Moreno A, Sousa AS, Scanavacca MI. Development and validation of a risk score for predicting death in Chagas' heart disease. N Engl J Med 2006;355: 799–808
21. Flemons WW, Douglas NJ, Kuna ST, Rodenstein DO, Wheatley J. Access to diagnosis and treatment of patients with suspected sleep apnea. Am J Respir Crit Care Med 2004;169: 668–72
22. Ramachandran SK, Josephs LA. A meta-analysis of clinical screening tests for obstructive sleep apnea. Anesthesiology 2009;110:928–39
23. Kaw R, Michota F, Jaffer A, Ghamande S, Auckley D, Golish J. Unrecognized sleep apnea in the surgical patient: implications for the perioperative setting. Chest 2006;129:198–205
24. Bernards CM, Knowlton SL, Schmidt DF, DePaso WJ, Lee MK, McDonald SB, Bains OS. Respiratory and sleep effects of remifentanil in volunteers with moderate obstructive sleep apnea. Anesthesiology 2009;110:41–9
25. Rosenstock C, Gillesberg I, Gatke MR, Levin D, Kristensen MS, Rasmussen LS. Inter-observer agreement of tests used for prediction of difficult laryngoscopy/tracheal intubation. Acta Anaesthesiol Scand 2005;49:1057–62
26. Lopez PP, Stefan B, Schulman CI, Byers PM. Prevalence of sleep apnea in morbidly obese patients who presented for weight loss surgery evaluation: more evidence for routine screening for obstructive sleep apnea before weight loss surgery. Am Surg 2008;74:834–8
27. Castriotta RJ, Wilde MC, Lai JM, Atanasov S, Masel BE, Kuna ST. Prevalence and consequences of sleep disorders in traumatic brain injury. J Clin Sleep Med 2007;3:349–56
28. Bixler EO, Vgontzas AN, Lin HM, Calhoun SL, Vela-Bueno A, Kales A. Excessive daytime sleepiness in a general population sample: the role of sleep apnea, age, obesity, diabetes, and depression. J Clin Endocrinol Metab 2005;90:4510–5
© 2010 International Anesthesia Research Society