# Commentary: Intuitions, Simulations, Theorems: The Role and Limits of Methodology

From the Department of Epidemiology and Department of Statistics, University of California, LA.

Correspondence: Sander Greenland, Department of Epidemiology, University of California, Los Angeles, CA 90095-1772. E-mail: lesdomes@ucla.edu.

It has long been held that, absent other errors, nondifferential (“unbiased”) misclassification of exposure biases effect estimates toward the null (which was proven in the binary-exposure case^{1}) or beyond the null for errors worse than random (such as coding errors). Analogously, it was argued that nondifferential misclassification of a confounder biases an adjusted estimate toward the unadjusted exposure–disease association.^{2} By the 1990s, however, it was noted that exceptions to the exposure rule could occur if the exposure was not binary,^{3} and that exceptions to the confounder rule could occur if the confounder was not binary.^{4} Trends could even be reversed by nondifferential exposure misclassification alone.^{3} Although trend reversal would not occur under certain monotinicity conditions,^{5} any simple rule could be rendered inoperative by unfavorable types of error dependencies.^{6}^{–}^{9}

Despite these cautionary findings, textbooks have continued to teach that nondifferential misclassification of a binary confounder would bias results in the direction of the confounding.^{10} p. 145 In this issue of the journal, Ogburn and VanderWeele^{11} correct the record by showing that this rule (which they call the “partial control result”) can be violated if both (1) the exposure-conditional confounder effect on the outcome reverses direction across exposure levels (qualitative interaction), and (2) the adjustment method involves weighting across confounder strata with weights between the extremes of weighting to the exposure distribution of the confounder (classic SMR weighting, which yields the effect of exposure on the exposed) and weighting to the unexposed. They also provide reassurance by proving that the rule holds when either the confounder effect does not reverse direction (which they call monotonicity of the confounder effect) or when the weighting scheme is from the exposed or from the unexposed.

### Is Effect Reversal of Practical Concern for Partial Control?

The weighting implicit in common multiplicative-modeling procedures approximates that of the total sample,^{12} in which case monotonicity of the confounder effect becomes the pivotal assumption for the partial-control rule. In most previous literature, the partial-control rule was studied under this monotonicity, so that the rule was corroborated repeatedly.^{4},^{13}^{–}^{16} This fact, along with new results,^{5},^{11} suggests that monotonicity and independence assumptions were implicit in the intuitions underlying the original rules regarding exposure and confounder misclassification. Nonetheless, the partial control rule may sometimes continue to hold under effect reversal (nonmonotonicity), and so a key practical question is what sort or what degree of reversal is needed for violation of the rule.

Consider Ogburn and VanderWeele's^{11} hypothetical Table 1 showing violation of the rule. The unadjusted (crude), correctly adjusted, and misclassified adjusted (true) risk ratios are 1.244, 1.246, and 1.203, respectively, assuming sensitivity for C = 1 of 100% and specificity for C = 0 of 75%. Even if the only other error present is purely random, this violation is well below ordinary epidemiologic detection levels. Further, as their Figure 2 shows, the violation becomes even smaller and eventually disappears as the sensitivity is reduced below 100%. Yet this small violation required a scenario in which the correct risk ratios relating the confounder to outcome given exposure (CY|A) are 4.92 for A = 1 and 0.0547 for A = 0, a 90-fold range across the strata; the misclassified risk ratios are 4.39 and 0.389, an 11-fold range. Similarly, the correct risk ratios relating the exposure to outcome given the confounder (AY|C) are 23.1 for C = 1 and 0.257 for C = 0, again a 90-fold range across the strata; the misclassified risk ratios are 2.90 and 0.257, again an 11-fold range, with a *P* value less than 0.001 for heterogeneity.

In practice, without a prior that strongly indicates monotonicity (and thus supports the partial-control rule), it would be a rather serious error for such enormous risk-ratio modification by C to go unremarked in favor of presenting only the summary risk ratio. Instead, concern about C as a modifier would hopefully lead to presentation of the stratified estimates, rendering the partial-control rule moot. More generally, I know of no epidemiologic example in the 90-fold range (not even the 1960 thiazide study^{17} cited by Ogburn and VanderWeele,^{11} which is a case report and a challenge trial of 40 patients, with no data indicating misclassification). Although less extreme reversal could still lead to violation of the partial-control rule,^{11} the confounding and violation that result may be even smaller than in this example. Thus, although real examples of effect reversal do occur, it remains unclear when or whether violations of the partial-control rule due to effect reversal could be of practical concern; further exploration would be welcome.

### How Reliable Are Conventional Rules and Methods in Practice?

There are many assumptions and simplifications underlying all rules, theorems, simulations, and methods. Although most assumptions and simplifications are left implicit, they need to be brought to the fore in real applications. For example, partial-control rules becomes doubtful if one is unsure whether the covariate at issue is really a confounder. Rather than indicating a need to adjust for the covariate, the change in exposure effect estimate on adjustment may be an artifact of mismeasurement.^{18} Complicating the picture further, matching can lead to increased bias from misclassification,^{19} and apparent effect modification can be badly distorted by either exposure or covariate mismeasurement.^{2} Mismeasurement of covariates is often accompanied by mismeasurement of exposure (eg, throughout diet, nutritional, occupational, environmental, and social epidemiology), with potential combined effects that are well beyond intuition or simple results.^{9},^{18}

As a final blow to conventional rules and analysis methods, we can rarely be certain that all the conditions required for their application are satisfied. Such considerations reinforce the notion that (even if supporting theorems are available) great caution is needed in using a rule or method without extensive simulations to investigate the conditions under which it might be reasonable for practice. When biases comparable in size with the effect under study are a reasonable possibility, epidemiologic inference (as opposed to reports focusing on study and data description) may come to rely on computations and simulations tailored to the specifics of the study context, rather than rely solely on general results or methods.^{10} Ch. 19;^{20}

Then too, like intuitions, theorems and simulations can be in error. Some of the earliest asserted results on effects of misclassification and collapsing categories were found to be incorrect,^{21}^{–}^{23} as were some later simulations concerning misclassification correction.^{24} Misclassification seems especially tricky to study, given that even simple cases can lead to nonintuitive biases and to formulas more involved than analogous results for confounding and selection bias.^{8},^{9},^{10} Ch. 19;^{18} Ogburn and VanderWeele^{11} illustrate this fact well in their theorems, and they deserve special praise for finding the proofs; the labor required makes it all the more surprising that the original intuitive rule is for the most part correct.

## CONCLUSION

I expect many more interesting and useful results will arise from exploring the limits of past intuitions and heuristics, and so it is heartening to see a new generation of methodologists rise to the challenge. These explorations will help put epidemiologic methods on a more rigorous foundation, a process that has been underway since the pioneering work of Jerome Cornfield and colleagues in the 1950s and 1960s.^{25},^{26}

Despite staggering advances in theory and computation since then, causal inference in health and social science remains far too complex to rigorously model every important aspect of real situations in full detail, especially when the time comes to synthesize diverse evidence.^{25},^{26} Thus, in practice, even the most mathematically rigorous results will have to be tempered with respect for that complexity. In particular, real-world decisions will be forced to fall back on fallible and potentially biased heuristics and judgments, which will in turn require persistent methodological attention to detect their flaws and limitations. The open-ended learning cycle between theory and practice needs to be recognized more explicitly, especially in teaching. Otherwise, methodology will be mistaken for a static body of concepts and practices to be applied rigidly throughout diverse contexts—a fate that befell the teaching and practice of statistical testing, to the great detriment of the health sciences.^{10} Ch. 10;^{27},^{28}

In sum, all practical methodologies are collections of heuristics, including bias analysis and causal modeling (formal “causal-inference” methodology) as well as conventional statistical methods. To the extent any analysis or methodology appears rigorous (a mathematical proof, a simulation study, a formal statistical analysis), that rigor applies only within a framework of assumptions of which at least some will be questionable. I thus regard these methodologies as mathematically framed heuristics for which (like all heuristics) risk of overextension, overconfidence, and failure lurks in any application. Nonetheless, they do have advantages over informal analogs insofar as their assumptions and logical errors can be more easily recognized by those conversant in the ncessary mathematical language.