Davy has actually invented a new pleasure, for which language has no name. Oh, Tom! I am going for more this evening; it makes one strong, and so happy! and without any after-debility, but, instead of it, increased strength of mind and body. Oh, excellent air-bag! Tom, I am sure the air in heaven must be this wonder-working gas of delight!1
—Robert Southey, Letter to Thomas Southey, July 12, 1799
In this issue of the journal, Turan et al.2 report for the first time that noncardiac surgery patients receiving general anesthesia with nitrous oxide (N2O) experience 33% decreased odds of 30-day mortality, 17% decreased odds of in-hospital morbidity and mortality, and 41% decreased odds of pulmonary morbidity compared with patients anesthetized without N2O. In particular, 60 deaths within 30 days after surgery were observed in 10,746 patients who received anesthesia that included N2O compared with 89 deaths in 10,746 matched patients who did not. To derive their unforeseen results, the authors applied propensity score matching, a nonexperimental, retrospective causal inference method, to reduce bias in the analysis of clinical data originally collected for purposes other than research.3,4 A first step in propensity analysis is to identify categoric and continuous confounding variables, or “baseline characteristics,” recorded within a chosen database that carry the potential to affect treatment selection and outcomes of interest after exposure to a treatment variable, that is, use of N2O. In settings amenable to propensity score matching, measured baseline characteristics capture all clinical and biologic predictors that enter into the decision whether or not to provide the treatment in question. Next, propensity scores, defined as a probability between 0 and 1 of receiving the experimental treatment rather than the control for a participant with given observed baseline characteristics, are estimated for each potential participant within the archive on the basis of the values of each individual’s baseline characteristics in a selected propensity model derived, for example, by multivariable logistic regression. In this fashion, a composite propensity score is used to replace a collection of baseline characteristics (e.g., age, sex, disease severity, comorbidity, etc.) with a single summary value that expresses the likelihood of receiving the treatment rather than the control. Accordingly, propensity scores estimate the predicted probability of the use of a treatment in a given participant on the basis of his or her characteristics at the time treatment is chosen. In theory, the effect of the treatment can then be measured among patients who have the same predicted propensity (i.e., probability) of treatment to control for confounding and selection bias. Experimental and control participants are then matched for closeness of their estimated pairwise propensity scores. Participants with similar propensity scores have a similar distribution of measured baseline characteristics with a similar chance of receiving the treatment, thereby mimicking randomized trials with respect to factors that have been adequately measured. Treated and untreated participants with the same propensity score may have different distributions of specific covariates, and of unmeasured variables. Potential participants lacking a propensity score match in the paired cohort are eliminated from further analysis, possibly leading to a loss of information and a decrement in the precision of the estimated association between the treatment and the outcome. After assuring the success of the match using standardized differences, outcomes are compared between the treated and untreated groups, and the results are interpreted in the broader context, that is, whether or not treated and untreated participants with matched baseline characteristics experience different clinical consequences.
Compared with randomized, controlled trials (RCTs), the “gold standard” for assessing the effectiveness and safety of medications, propensity score analysis comprises many attractive features that account for its growing popularity. In recent decades, numerous administrative databases have been compiled for quality assurance, medicolegal compliance, and billing purposes on media that are easily accessed, often without triggering IRB protections, or without the need for specific informed consent of participants. Because the observational data are drawn from the real-world experiences of patients, clinicians, and institutions, the generalizability of these results may be greater than that of the results arising from the necessarily more controlled and artificial environment of RCT protocols. So long as all potential confounding variables are known and measured, propensity score analysis is tolerant of a large number of baseline characteristics and multiple outcomes, even if rare, in providing an efficient estimate of the average treatment effects.5 In turn, RCT protocols confront numerous constraints including high cost, long time to completion, enrollment limited by strict inclusion and exclusion criteria, selection bias of participants willing to be recruited, compliance of participants and caregivers, attrition, thorny ethical considerations, feasibility, and the relevance of RCT results to the population at large undergoing routine care in daily practice.
Despite these drawbacks, random assignment of a participant to either the experimental or control group in RCT protocols assures that treatment selection is unrelated to differences in measured and unmeasured baseline characteristics between groups, and that conclusions regarding a treatment effect are therefore unbiased. Conversely, to avoid bias in the propensity score analysis of observational data, all pertinent covariates potentially related to the treatment assignment and outcomes that are present before treatment selection must be known and included in the propensity model. If unobserved and unbalanced differences between the treatment and control groups are present, propensity score estimates incorporate hidden bias that is impossible to address using post hoc statistical techniques. Randomization balances both observed and unobserved confounders. Only investigations that use random assignment to treatment and control groups yield estimates of treatment effects that are unbiased with respect to unmeasured factors.
In view of the central importance of the breadth and depth of a chosen database to the success of propensity score analysis, it is unfortunate that the Cleveland Clinic Perioperative Health Documentation System (PHDS) “does not distinguish between a priori codes related to baseline health status and planned procedures from actual procedure codes and complications accumulated during hospitalization. The reason is the MEDPAR (i.e., the Medical Provider Analysis and Review database) and most of the PHDS data are derived from claims reports that do not indicate the diagnostic codes present on admission, which reflect baseline patient characteristics, or the principal planned or required procedures as opposed to diagnosis and procedure codes arising from complications during hospitalization.”6 Nevertheless, Turan et al. have been able to identify baseline conditions in the PHDS comprising ICD-9-CM categories that correspond to a subset of chronic conditions present before surgery. However, readers are unable to judge the possible effect of baseline characteristics that have been omitted, or to ascertain the process used by the authors to include or exclude potential baseline characteristics. For example, lacking the possibility of detecting an N2O dose effect, the authors are deprived of a key instrument in testing causation in a nonrandomized, observational investigation. The reported average FIO2 during surgery may serve as a crude surrogate, but the higher average FIO2 in nonnitrous patients after matching also points to a greater severity of cardiorespiratory disease in the nontreatment group (see below).
With regard to the reported baseline characteristics, the authors match data for year of surgery, and note decreased use of N2O “in recent years,” but they do not indicate whether N2O use has systematically increased, declined, or stayed the same during this interval at the reporting institution. The Charlson comorbidity score has been validated for long-term (i.e., 10-year) survival in patients with medical conditions, but to our knowledge, it has not been validated as a predictor of short-term (i.e., 30-day) survival in patients with surgical conditions.7 It is important to note that the authors provide no evidence that other scored variables (i.e., sex, age, body mass index, “race,” cancer, diabetes, etc.) enter into a caregiver’s decision whether or not to use N2O in contemporary practice. To the contrary, multiple baseline characteristics of potential relevance to treatment selection and outcomes were not scored, for example, folate and cobalamin nutritional deficiency present in 10% to 20% of the elderly population,8 inborn errors of single carbon metabolism (e.g., 10%–15% of patients undergoing surgery express homozygous mutations in the gene encoding 5,10 methylenetetrahydrofolate reductase (MTHFR) causing reduced enzyme activity and well-validated clinical phenotypes),9 antifolate chemotherapeutics, coexisting diseases with hyperhomocysteinemia, or intercurrent medications that elevate homocysteine (e.g., oral hypoglycemics, anticonvulsants, levodopa, cyclosporine). Indices of the magnitude of surgery, and destinations after surgery (i.e., ambulatory, first-day, and in-patient surgery), may factor into a practitioner’s choice to use N2O, but were not scored. In particular, Turan et al. provide no consideration of the variables weighed by anesthesia providers in selecting treatment. Consequently, all relevant real-world factors may not have been measured, scored, and entered into the analysis.
Also, the authors do not indicate or reference the source of the reported 30-day mortality data. Failure to provide a summary table of the actual and specific 30-day mortalities is a significant limitation. Many of the scored in-hospital morbidities have no apparent relevance to complications arising from N2O use such as transfusion-related lung injury, tracheostomy complications, urinary tract stoma, and so on. Many morbidities of relevance to N2O use were not scored, including venous thrombosis, venous air embolus, or changes in postoperative cognitive status.
Table 2 in the study by Turan et al. indicates that many patients in the authors’ database could not be matched, that is, 37% of patients who received N2O, and 48% of patients who did not, were excluded from the analysis. The tabulated imbalance suggests that “sicker” patients at higher risk for mortality and for respiratory complications were generally not given N2O, and that the matched nonnitrous patients may also be at higher risk although baseline characteristics available for inclusion may not have properly captured the severity of a specific (i.e., pulmonary) illness. In addition, the higher rate of respiratory complications could have been due to unmatched patients undergoing laparoscopic surgery, major abdominal surgery, or those requiring postoperative care in an intensive care unit. The authors suggest that high inspired oxygen concentrations contribute to postoperative atelectasis, and hint at the possibility that inclusion of N2O is protective despite a modest (46% vs 55%) difference in the inspired oxygen concentrations in the 2 matched groups. Controversy regarding the direct pulmonary effects of N2O in inspired gas mixtures is ongoing. As a soluble gas, N2O promotes absorption atelectasis in the lung as effectively as 100% oxygen.10,11 Indeed, Joyce et al.10 report that an inspired gas mixture containing N2O with an oxygen concentration 30% or greater leads to more rapid lung collapse than 100% oxygen.
Although Turan et al. report surprisingly beneficial, rather than adverse, effects of N2O use across a broad range of noncardiac surgeries, 2 other recent reports gainsay these observations. In the ENIGMA trial, a large RCT comprising >2000 participants, N2O was associated with a higher rate of postoperative atelectasis and pneumonia when compared with N2O-free anesthesia.12 The rate of respiratory complications was substantially reduced in the N2O-free group, with an adjusted odds ratio of 0.54 (95% confidence interval [CI], 0.40–0.74; P < 0.001). A follow-up study (median, 3.5 years after surgery) found that N2O anesthesia had no significant association with late mortality (hazard ratio, 0.98; 95% CI, 0.80–1.20; P = 0.82), but suggested an increased rate of myocardial infarction (1.59; 95% CI, 1.01–2.51; P = 0.04).13 In view of the findings of the ENIGMA RCT, sources of confounding in the Turan et al. report warrant consideration, particularly since propensity scoring techniques are increasingly used to probe anesthesia data sets assembled to satisfy administrative obligations. Whereas large sample sizes in RCTs reduce confounding that arises from bias, in large observational investigations, inherent bias expands with sample size. Small random errors may lead to the detection of differences in risk that emerge from bias rather than from an effect of treatment in large observational studies.14 Of note, a substantial proportion (43% overall) of the original data set could not be used for matching. Many of the included patients may have had abbreviated ambulatory or other minor surgery, and many of the excluded patients may have been elderly and have undergone more extensive surgery, perhaps at greater risk for N2O-induced complications. Each of these differences is plausible in view of the recognized adverse effects of N2O.
Turan et al. note: “But evidence that routine use of N2O causes clinically important toxicity remains elusive.” To the contrary, we are aware of numerous case reports of profound neurologic toxicity after anesthetic N2O use (i.e., not chronic substance abuse). See, for example, Renard et al.,15 Singer et al.,16 El Otamni et al.,17 and Somyreddy and Kothari.18 The authors also assert: “But in the meantime, our results do not suggest that N2O should be avoided for fear of cardiovascular complications, especially since interventions to reduce plasma homocysteine concentrations do not reduce cardiovascular events in nonsurgical settings.” Although references cited by Turan et al. and others19 confirm a lack of benefit of B vitamin supplementation in lowering chronic homocysteine levels for secondary prevention in patients with advanced atherosclerotic cardiovascular disease (i.e., survivors of antecedent ischemic myocardial events), the relation of acute homocysteine elevation and cardiovascular events in surgical settings is not addressed by these or other clinical trials. Moreover, evidence for significant reductions in cerebrovascular risk with interventions tailored to lower homocysteine is available from at least 2 independent RCTs.20,21
In evaluating the contribution of Turan et al., several caveats are in order. Anesthesia caregivers relying on Turan et al. for support in their ongoing use of N2O in anesthesia maintenance should assure that they are familiar with the components of propensity score analysis including “greedy distance,” “distinct effects generalized estimating equation (GEE) models,” “caliper,” assumptions of “strong ignorability,” and so forth. In this context, Paul Rosenbaum’s recent “Design of Observational Studies,” and its consideration of power and sensitivity analysis in empiric nonrandomized investigations, is highly recommended.22 We note that retrospective, observational propensity score investigations may themselves comprise inherent toxicity in providing a false sense of security for ongoing use of a risky intervention, or in promoting abandonment of a valuable modality, if their conclusions are later proven by RCTs to be in error.23
No amply powered investigations have previously concluded that N2O anesthesia reduces all-cause mortality or respiratory complications. The pulmonary N-methyl-D-aspartate receptor antagonist explanation for reduced lung injury with N2O proposed by Turan et al. is an assumption at present, in the absence of targeted experiments using N2O at the bench and in the clinic to test this hypothesis. Nor can we suggest another mechanism that could explain such “wonder-working” effects. Unexpected results in the face of biological uncertainty demand particularly close scrutiny.24 A more likely explanation is that the investigation of Turan et al. comprises hidden biases and confounding, but we cannot be sure. Unknown and unmeasured variables cannot be accounted for in a propensity score or other multivariable analysis.25
Name: Kirk Hogan, MD, JD.
Contribution: This author helped write the manuscript.
Attestation: Kirk Hogan approved the final manuscript.
Name: Paul S. Myles, MD, MPH.
Contribution: This author helped write the manuscript.
Attestation: Paul S. Myles approved the final manuscript.
This manuscript was handled by: Sorin J. Brull, MD, FCARCSI (Hon).
1. Southey R. The voice of the blood: anesthetics and literature. In: The Road of Excess: A History of Writers on Drugs. 2005 Boston, MA Harvard University Press:90
2. Turan A, Mascha EJ, You J, Kurz A, Shiba A, Saager L, Sessler DI. The association between nitrous oxide and postoperative mortality and morbidity after noncardiac surgery. Anesth Analg. 2013;116:1026–33
3. Myles PS. What’s new in trial design: propensity scores, equivalence and non-inferiority. J Extracorporeal Tech. 2009;4:6–10
4. Gayat E, Pirracchio R, Resche-Rigon M, Mebazaa A, Mary JY, Porcher R. Propensity scores in intensive care and anaesthesiology literature: a systematic review. Intensive Care Med. 2010;36:1993–2003
5. Luo Z, Gardiner JC, Bradley CJ. Applying propensity score methods in medical research: pitfalls and prospects. Med Care Res Rev. 2010;67:528–54
6. Sessler DI, Sigl JC, Manberg PJ, Kelley SD, Schubert A, Chamoun NG. Broadly applicable risk stratification system for predicting duration of hospitalization and mortality. Anesthesiology. 2010;113:1026–37
7. Charlson ME, Pompei P, Ales KL, MacKenzie CR. A new method of classifying prognostic comorbidity in longitudinal studies: development and validation. J Chronic Dis. 1987;40:373–83
8. Clarke R, Grimley Evans J, Schneede J, Nexo E, Bates C, Fletcher A, Prentice A, Johnston C, Ueland PM, Refsum H, Sherliker P, Birks J, Whitlock G, Breeze E, Scott JM. Vitamin B12 and folate deficiency in later life. Age Ageing. 2004;33:34–41
9. Hogan KJ, Burmester JK, Caldwell MD, Hogan QH, Coursin DB, Green DN, Selzer RM, Broderick TP, Rusy DA, Poroli M, Lutz AL, Sanders AM, Oldenburg MC, Koelbl JA, de Arruda-Indig M, Halsey JL, Day SP, Domanico MJ. Perioperative genomic profiles using structure-specific oligonucleotide probes. Clin Med Res. 2009;7:69–84
10. Joyce CJ, Baker AB, Kennedy RR. Gas uptake from an unventilated area of lung: computer model of absorption atelectasis. J Appl Physiol. 1993;74:1107–16
11. Robinson GJ, Peyton PJ, Terry D, Malekzadeh S, Thompson B. Continuous measurement of gas uptake and elimination in anesthetized patients using an extractable marker gas. J Appl Physiol. 2004;97:960–6
12. Myles PS, Leslie K, Chan MT, Forbes A, Paech MJ, Peyton P, Silbert BS, Pascoe EENIGMA Trial Group. . Avoidance of nitrous oxide for patients undergoing major surgery: a randomized controlled trial. Anesthesiology. 2007;107:221–31
13. Leslie K, Myles PS, Chan MT, Forbes A, Paech MJ, Peyton P, Silbert BS, Williamson E. Nitrous oxide and long-term morbidity and mortality in the ENIGMA trial. Anesth Analg. 2011;112:387–93
14. MacMahon S, Collins R. Reliable assessment of the effects of treatment on mortality and major morbidity, II: observational studies. Lancet. 2001;357:455–62
15. Renard D, Dutray A, Remy A, Castelnovo G, Labauge P. Subacute combined degeneration of the spinal cord caused by nitrous oxide anaesthesia. Neurol Sci. 2009;30:75–6
16. Singer MA, Lazaridis C, Nations SP, Wolfe GI. Reversible nitrous oxide-induced myeloneuropathy with pernicious anemia: case report and literature review. Muscle Nerve. 2008;37:125–9
17. El Otamni H, Moutawakil B, Moutawakil F, Gam I, Rafai MA, Slassi I.. Post-operative dementia: toxicity of nitrous oxide. Encephale. 2007;33:95–7
18. Somyreddy K, Kothari M. Nitrous oxide induced sub-acute combined degeneration of spinal cord: a case report. Electromyogr Clin Neurophysiol. 2008;48:225–8
19. Armitage JM, Bowman L, Clarke RJ, Wallendszus K, Bulbulia R, Rahimi K, Haynes R, Parish S, Sleight P, Peto R, Collins R. Effects of homocysteine-lowering with folic acid plus vitamin B12 vs. placebo on mortality and major morbidity in myocardial infarction survivors: a randomized trial. JAMA. 2010;303:2486–94
20. Saposnik G, Ray JG, Sheridan P, McQueen M, Lonn EHeart Outcomes Prevention Evaluation 2 Investigators. . Homocysteine-lowering therapy and stroke risk, severity, and disability: additional findings from the HOPE 2 trial. Stroke. 2009;40:1365–72
21. Spence JD. Perspective on the efficacy analysis of the Vitamin Intervention for Stroke Prevention trial. Clin Chem Lab Med. 2007;45:1582–5
22. Rosenbaum P Design of Observational Studies. 2010 New York Springer
23. Hessel EA, Apostolidou I. Pulmonary artery catheter for coronary artery bypass graft: does it harm our patients? Primum non nocere. Anesth Analg. 2011;113:987–9
24. Shafer SL. Did our brains fall out? Anesth Analg. 2007;104:247–8
25. Datta M. You cannot exclude the explanation you have not considered. Lancet. 1993;342:345–7