A common challenge in the analysis of epidemiologic data is the problem of estimating independent effects of correlated exposures. One conventional approach is to assess each exposure separately, while adjusting for a subset of the other exposures, considered potential confounders. Results obtained with this approach may depend on the criteria used to identify confounders. The approach also raises the issue of multiple testing. Semi-Bayes modeling, a form of hierarchical regression, was developed to estimate the independent effects of many variables, while addressing some methodologic challenges that typify conventional analyses of large numbers of exposures: incorporating prior knowledge, avoiding inappropriate exclusion of real but unrecognized confounders, improving the efficiency of the estimation compared with a conventional analysis with the same number of covariates, and mitigation of the problems of multiple testing.^{1–9} Despite theoretical arguments in its favor,^{10–12} and potential advantages demonstrated in simulations,^{3,13,14} hierarchical modeling has not displaced conventional methods in common epidemiologic practice, although its use is increasing.

The impetus for the current analysis was the analytic challenge presented by a large case-control study, the aim of which was to assess putative occupational risk factors for lung cancer.^{15} That study assessed each person's exposure to hundreds of the most common occupational exposures thought to be present in the Montreal-area working environment. For most of these agents, many of which are correlated, there was very little epidemiologic evidence of elevated cancer risk. The analytic challenge was to estimate the possible independent effect of each agent on lung cancer risk. Several statistical approaches were possible, including conventional logistic regression, with various alternative strategies for model-building, and semi-Bayes methods. Thus, it was of interest to establish whether these different approaches in fact lead to different sets of results.

The present paper is not intended to provide a substantive evaluation of the carcinogenic potential of each agent. Indeed, we use simple dichotomous exposure measures and ignore such issues as dose-dependency and latency. The focus was on the methodologic comparison of different analytic approaches. Clearly, such an empirical comparison, where the “truth” is unknown, cannot establish the optimal approach. However, it can assess the robustness of the results and possibly link potential differences in results to the assumptions underlying different approaches.

METHODS
Data Sources
The study subjects were male residents of Montreal, diagnosed between 1979 and 1985 with 1 of 19 types of histologically confirmed cancer.^{15,16} Altogether, 3730 cancer patients were interviewed. For the present analyses, 857 interviewed men with lung cancer comprised the case series, and the controls were selected from among remaining cancer patients. The use of cancer controls can lead to an estimate biased toward the null if the exposure under investigation is associated with cancers in the control group. To minimize any such bias, we randomly subsampled men with more frequent cancer types, to ensure that no cancer type represented more than 10% of the entire cancer control group. This resulted in 2172 controls.

Information was collected on a host of personal characteristics, including smoking history and a detailed lifetime job history. The latter was used by our team of industrial hygienists to ascribe exposures to each of 294 substances on a checklist, using an expert opinion-based approach pioneered by our team.^{17} (The entire checklist, with definitions, is shown in a report by Siemiatycki.^{16} ) This checklist included well-defined chemical compounds, complex mixtures, and chemical groups. There was considerable overlap among some substances. For instance, exposure to lead chromate was recorded under lead chromate, lead compounds, chromium compounds, and chromium VI compounds.^{18} Among the 294 substances, 110 were excluded from the present analysis, because of exposure prevalence below 1%, or because in retrospect the industrial hygienists judged the reliability of the attribution of exposure to that agent to be questionable, or because pairs of substances were embedded one within the other, resulting in coefficients that would be impossible to interpret (eg, “chromium compounds” and “lead chromates”). eTable 1 (https://links.lww.com/EDE/A350 ) lists the excluded substances, by reason for exclusion. After these exclusions, 184 substances were retained for the present analyses.

Each exposure was represented by a dichotomous variable, indicating whether the subject had ever been exposed to the substance of interest, excluding the 5 years before diagnosis or interview. Thus, the present analyses did not account for duration or intensity of exposure.

Regression Modeling Strategies
All analyses used case/control status as a binary dependent variable. In all models, 8 nonoccupational variables were included as potential confounders: age, ethnicity, income, education, history of alcohol consumption, respondent status (self or proxy), indicator of some physical activity in adulthood, and, most importantly, history of cigarette smoking, which was modeled with a 3-variable parameterization (ever/never smoking status, cumulative pack-years, and years since smoking cessation).^{19}

Six modeling strategies were implemented for the estimation of the effects of the 184 substances. In each strategy, the effects of all 184 substances were estimated, but the strategies differed with respect to which other substances were included as covariates. Strategies 1–4 were variations of conventional logistic regression, and strategies 5–6 relied on semi-Bayes models. Before outlining the strategies, we briefly recall the basic concepts of semi-Bayes modeling.

Semi-Bayes modeling is a form of hierarchical modeling wherein some of the parameters for the Bayesian prior are generated from the data at hand.^{20} When many exposure variables are collected, there may be some basis for grouping or categorizing them according to similarity of expected effects. For example, in their semi-Bayes analysis of an occupational cancer study, de Roos et al^{21} grouped exposures that have similar chemical characteristics, and assumed that carcinogenic effects within each category would be similar. Such categorization, referred to as “exchangeability of effects,” can be used in 2-level semi-Bayes modeling.^{22} The first-level model is the conventional “full” logistic regression model, which includes all exposure variables. Next, a second-level multivariable linear model regresses the first-level logistic coefficients for individual exposures on the binary indicators of the prespecified categories, providing an estimate of the “average effect” for each category. The final semi-Bayes estimate of the effect of each exposure is a weighted average of the corresponding first-level and second-level estimates, where the weights depend partly on the estimated inverse variance of the corresponding first-level estimate, and partly on an assumed residual variance of the respective second-level estimates. If a given substance belongs to several categories, then the first-level estimate is combined with the subset of second-level estimates for all relevant categories.^{7} Accordingly, the final semi-Bayes estimates for individual exposures are shrunk toward the average effect for the respective category, which usually pulls them toward the null value,^{23} with greater shrinkage for less precise estimates. Validity and efficiency of hierarchical analyses depend in part on the validity of external knowledge used to create the exchangeability categories.

We implemented the following 6 strategies, all of which adjusted for the standard set of 8 nonoccupational variables:

Strategy 1. No occupational covariates. Each of the 184 substances was assessed in a separate single-exposure logistic regression model, with no adjustment for other substances.
Strategy 2. A priori selection. Separate logistic regression model for each substance, with adjustment for 8 other substances, considered potential confounders: asbestos, crystalline silica, chromium VI compounds, arsenic compounds, diesel engine exhaust, soot, wood dust, and benzo(a)pyrene. These 8 substances are all designated by the International Agency for Research on Cancer (IARC) as group 1 or 2A lung carcinogens,^{24} and each had exposure prevalence greater than 1% in the study population.
Strategy 3. A priori plus change-in-estimate selection. Separate model for each substance, including the same 8 recognized carcinogens as in strategy 2, but additional substances were included based on a change-in-estimate criterion, for the odds ratio (OR), of at least 10%.^{15,25} This was implemented by adding one occupational covariate at a time, and comparing the resulting adjusted OR for the “main exposure” with the corresponding strategy 2 estimate.
Strategy 4. Full model. A single logistic regression model included all 184 exposure variables. This model corresponded to the first-level model in semi-Bayes strategies 5 and 6.
Strategy 5. Semi-Bayes, all effects exchangeable. Effects of all chemicals were estimated simultaneously, in a single model, and all were considered exchangeable. The second-level design matrix reflected the extreme assumption that there was no basis for distinguishing the expected carcinogenic effects of the different substances. Thus, all 184 estimates from strategy 4 were shrunk towards a common mean. The second-level residual variance for the 184 effects (log ORs) was set to T^{2} = [ln(10)/(2 × 1.96)]^{2} = 0.345. This reflects the assumption that the effects of individual exposures are normally distributed and would fall with 95% certainty within a 10-fold range of each other.^{21}
Strategy 6. Semi-Bayes, exchangeable subsets based on chemical properties. For strategy 6, our team of occupational hygienists created 30 categories of substances based on chemical and physical similarities, in the expectation that potential carcinogenic effects of substances would be more similar within categories than between categories. The categories of exchangeability were neither exhaustive nor mutually exclusive; some of the 184 substances were not in any category and some were in multiple categories. Each category contained at least 2 agents. eTable 2 (https://links.lww.com/EDE/A350 ) lists all the categories and their constituents. Previous evidence on carcinogenicity of each substance was also incorporated, based on reviews and published meta-analyses (such as those by Steenland et al^{26} and Monson^{27} ). Specifically, each of the 184 substances was given a prior value for its “expected” log OR. For most substances, there was no previous evidence of carcinogenicity and we attributed prior values of zero. For each substance with previous evidence, we attributed a value close to the respective published lower 95% confidence interval (CI) limit, derived from meta-analyses, if that value was above zero. Such conservative priors were adopted because ours was a population-based study, whereas most previous evidence came from cohort studies of work groups known to have experienced high levels of exposure. Thus, exchangeability information for strategy 6 was constructed as a 184 × 31 matrix, comprised of 1's and 0's for inclusion or exclusion of a given substance in the 30 chemical categories, and a single column of continuous log ORs based on previous evidence. These are shown in eTable 2 (https://links.lww.com/EDE/A350 ). The exchangeability categories identified in strategy 6 are expected to be more homogeneous than the entire set of 184 estimates considered exchangeable in strategy 5. We therefore specified a lower prior residual variance, corresponding to a 7-fold range (instead of the 10-fold range for strategy 5), resulting in T^{2} = 0.246.
Strategies 1–4 were implemented with standard unconditional logistic regression procedures in SAS (SAS institute, Cary, NC), and strategies 5 and 6 with a generalized linear mixed-models extension of logistic regression, using the GLIMMIX macro.^{28}

Comparing Results Between Strategies
The OR is both a quantitative indicator of the size of an association, and an index used, in conjunction with its standard error, to determine the statistical strength of the association. We used both the magnitude of the estimated OR and categories of P value, a qualitative measure of the estimate's “statistical significance,” as parameters to be compared across the various analytic strategies. We believe both are meaningful and complementary parameters.

RESULTS
Although most pairwise correlations among the exposures in the study population were quite small, a small number of pairs of agents were highly correlated; this could lead to problems of collinearity. The numbers of occupational covariates included in the model for each agent differed according to strategy, with 0 and 8, respectively, for strategies 1 and 2, and 183 for strategies 4, 5, and 6. With the change-in-estimate approach in strategy 3, the number of occupational covariates selected into the final models ranged from 0 to 18 (median 2), depending on the main exposure.

eTable 3 (https://links.lww.com/EDE/A350 ) shows the adjusted ORs for each of 184 agents, for each strategy. Table 1 compares various descriptive statistics for the distributions of 184 log ORs obtained with the 6 modeling strategies. Aside from strategy 1, the central locations of these distributions were close to zero. The standard deviations of the 184 substance-specific point estimates were substantially higher in strategies 3 and 4 than for the other strategies, reflecting variance inflation due to adjustment for several correlated covariates. As expected, the semi-Bayes models produced the narrowest range of point estimates. In contrast, on average, the lowest standard errors were estimated by strategies 1 and 2, whereas the full model of strategy 4 yielded the highest standard errors. Strategies 3, 5, and 6 produced similar, moderate standard errors.

TABLE 1: Means, Medians, and Standard Deviations of the Beta Estimates for the 184 Substances in Each Strategy

The eFigure (https://links.lww.com/EDE/A350 ) shows scatter plots of the OR estimates of selected pairs of strategies. Table 2 shows the corresponding pairwise Pearson correlation coefficients between point estimates of log ORs from different strategies. Estimates from strategies 1 and 2 were highly correlated, although, as indicated in Table 1 , the point estimates in strategy 2 were systematically closer to the null than those in strategy 1. Estimates from strategies 4, 5, and 6 were also highly correlated with each other, and only slightly less correlated with those from strategy 3. In contrast, the change-in-estimate strategy 3 showed only moderate correlations with simpler conventional strategies 1 and 2.

TABLE 2: Pearson Correlations Between Pairs of Strategies for the 184 Estimated Logistic Coefficients

Table 3 summarizes the results of each strategy through the distribution of the magnitude and statistical significance of the 184 ORs. Strategy 1 yielded 30 ORs above 1.0 with P < 0.05, by far the highest count of “significantly” elevated ORs. This strategy, which adjusted only for nonoccupational confounders, likely resulted in confounded estimates. Indeed, the adjustment for the 8 recognized occupational carcinogens in strategy 2 eliminated 18 of the 30 significantly elevated estimates from strategy 1. The full model of strategy 4 produced point estimates that were higher, and sometimes much higher, when compared with the other strategies. However, because these elevated point estimates were accompanied by considerably higher standard errors, the resulting P -values were seldom “significant.” The semi-Bayes strategies yielded fewer significantly elevated ORs.

TABLE 3: Distribution of Estimates From the Modeling Strategies, According to Direction of Effect and Associated P Value

A total of 34 substances had a significantly elevated OR in at least one of the strategies. Table 4 presents results for all 20 substances which had a statistically significant (P < 0.05) OR above 1.0 in at least one of strategies 2–6. Only 5 substances overlapped among the 12 substances with statistically elevated ORs in strategy 2 and the 13 in strategy 3. Strategies 4, 5, and 6 produced similar sets of earmarked chemicals, and all of those were also statistically significant in strategy 3. While they are not shown in Table 4 , the 30 substances with significant associations in strategy 1 include 16 of the 20 that were significant in any of strategies 2–6.

TABLE 4: Odds Ratio Estimates for 20 Substances Across the 6 Strategies, With Substances Selected Because They Had a Statistically Elevated OR in at Least one of Strategies 2 to 6

A comparison of results for chromium fumes and nickel fumes illustrates some properties of these modeling strategies. When examined in separate models with no occupational adjustment (strategy 1), OR estimates were 2.1 (95% CI = 1.4–3.2) for nickel fumes and 2.2 (1.4–3.4) for chromium fumes. However, almost everyone exposed to nickel fumes was also exposed to chromium fumes, leading to near-collinearity. When the change-in-estimate criterion was applied in strategy 3, each of the 2 fumes was selected as a confounder for the other (along with several other substances). The near-collinearity induced highly unstable estimates for both substances in strategies 3 and 4 (full model) that adjusted chromium for nickel fumes and vice versa with imprecise point estimates driven far from the null and in opposite directions. In semi-Bayes strategy 5, while both fumes were still adjusted for each other, the assumed exchangeability of all substances resulted in much tighter point estimates being pulled toward a common OR, close to 1.0. In strategy 6, chromium fumes was included in exchangeability categories of metal oxide fumes, heavy metal compounds, and chromates, with a prior log OR set at 0.8. Nickel fumes was included in categories of metal oxide fumes, heavy metal compounds, and nickel compounds, with a weaker prior log OR of 0.33. Consequently, the final second-stage estimate for chromium fumes increased in strategy 6, relative to strategy 5, whereas that for nickel fumes decreased (Table 4 ). Still, even in strategy 6 both estimates had wide confidence intervals and neither was statistically significant. Although it is not known which of these estimates best represents the underlying truth, this example illustrates how the semi-Bayes strategy can use prior knowledge to influence estimation and inference.

DISCUSSION
When assessing any exposure as a potential risk factor, one should adjust for all those covariates that are true confounders but, to avoid inefficiencies from over-adjustment, no others. This begs the question of how confounders should be identified. Several definitions of confounders and confounding have been proposed.^{29} Confounding is a context-specific phenomenon; whether a variable is a confounder depends on both its correlations with other variables and on what else is included in the statistical model.^{30–32} While known risk factors are certainly good candidates for confounders, they act as confounders only if, in a specific study sample, they are correlated with the exposure, conditional on other covariates. Furthermore, variables that are not yet known to be risk factors may in fact be confounders. Thus, confounders can be only imperfectly identified from previous knowledge and from the data at hand.^{33}

It is common epidemiologic practice to collect subject data on many variables. Conventional statistical methods have mainly focused on the problem of assessing the potential impact of one independent variable, while controlling for potential confounding of others, where the scientific question may be formulated in the following terms: “does this exposure cause this illness?”^{34,35} However, if the investigator is interested in estimating the effect of many covariates (“which of these exposures cause this illness?”), in what might be called a “screening analysis,” the conventional approach may not be optimal.^{36} In this article, we explored several methodologic approaches to addressing such questions.

While simulation studies have shown relative advantages of semi-Bayes approaches over conventional logistic regression approaches in some circumstances,^{3,13} it is unclear to what extent these results can be generalized. Such simulation results may depend on the particular implementation of semi-Bayes and conventional approaches used, as well as on the underlying assumptions regarding such design parameters as the numbers of exposure variables and subjects, correlations between exposures, correlated and uncorrelated measurement errors, and the extent and direction of any associations with the outcome. It is therefore of interest to explore whether alternative credible approaches lead to similar or different results in a real situation. Such empirical comparisons illustrate operational characteristics of alternative analytical approaches, and the practical implications of differences in underlying assumptions.^{37,38}

We empirically compared 6 commonly used or recommended approaches (both frequentist and semi-Bayesian) in a real-life multivariable analysis of a large set of intercorrelated exposures. There are several other possible approaches, including backward or forward automatic selection, and hierarchical models that impose more structure than the ones we chose. Further, even among the chosen strategies, decisions had to be taken for implementing each approach: eg, the selection of a priori occupational confounders for strategy 2, and the 2 second-level variance parameters for the semi-Bayes models.

Most of the agents on our list are not recognized carcinogens. It may be argued that this investigation should be restricted to known carcinogens. However, we believe that the comparison of the different strategies with a long list of agents (only some of which are true carcinogens) provides meaningful results, as well as being realistic and feasible.

Our analyses indicate that different modeling strategies can indeed lead to different results. Strategy 1 produced many more apparently significant results than other strategies. While strategy 1 provides a reasonable indication of potential risk factors among the variables examined, the estimates for individual substances are susceptible to mutual confounding. Strategy 2, with a few recognized lung carcinogens added to strategy 1 models, substantially reduced several ORs, compared with strategy 1, although the point estimates from strategies 1 and 2 were highly correlated. The validity of strategy 2 results depends on the assumption that there are no other important confounders apart from those selected. Strategy 3 protects against the possibility that other confounders lurk in the background by using data-based evidence of confounding to select additional adjustment variables, separately for each exposure. The inclusion of either these additional confounders (strategy 3) or all exposures considered (strategies 4–6), made quite a difference, manifested in moderate correlation coefficients and moderate overlap of significant findings between strategy 2 and the others. It is uncertain to what extent this reflects inadequate control for confounding in strategy 2,^{39,40} versus over-adjustment due to unnecessary inclusion of variables that are not true confounders for lung cancer in the other strategies.^{41}

Strategies 4, 5, and 6 were based on models with all 184 exposures, and the point estimates were highly correlated among these strategies. While strategy 4 adjusts for all available potential confounders, its results are susceptible to distortions related to small sample-size phenomena.^{42} It is doubtful that strategy 4 has any advantage over semi-Bayes strategies 5 and 6, apart from ease of application, whereas the latter strategies have the advantages of reducing the impact of unstable estimates and improving efficiency of estimation.^{10} Strategy 6 involved considerably more work and expense than strategy 5. Our attempt at creating meaningful categories for strategy 6 was based on a priori plausible yet still speculative ideas about shared carcinogenic potential among different subsets of the agents. Furthermore, strategy 6 incorporated the prior evidence about the possible associations between individual exposures and lung cancer risk, based on systematic reviews and meta-analyses. However, with few exceptions, the results were similar to those of strategy 5. It may be that the chemical and physical criteria used to create “exchangeable” categories were not in fact related to carcinogenic potential. This may reflect the difficulties in relying on presumed exchangeability criteria. If those criteria are not related to disease risk, the utility of the semi-Bayes approach is compromised. As understanding of carcinogenicity continues to improve, creating valid predictive categories will become more attainable. This indicates a need for continuing information feedback between results of basic and epidemiologic studies in cancer etiology. While the benefits of strategy 6 over strategy 5 were not obvious for most substances, the example of nickel versus chromium fumes illustrates how the 2 strategies may sometimes yield different results, especially in the case of substances that are correlated in the study sample, but are believed to have different carcinogenic potential based on previous evidence. Exchangeability categories other than those used in this paper could also be envisaged. For instance, chemicals could have been dichotomized by whether or not they are believed a priori to be lung carcinogens. It was not our aim to explore all possible exchangeability categories. However, the approach chosen had the advantage of building on information from previous investigations (using a quantitative measure of log RR from past studies) as well as the chemical/physical properties of the chemical agents.

As expected, strategy 1 produced the greatest number of “positive findings” and strategies 4–6 produced the fewest. In itself, this is not an argument in favor of one strategy or another. In any given investigation, there may be different types of costs associated with false-positives versus false-negatives. Depending on the trade-off in a given investigation, this may favor a strategy that produces more apparent “positives” (such as strategies 1–3) or fewer (eg, strategies 4–6).

Whenever possible, sensitivity analyses should be used to assess the impact of various underlying assumptions on the final results. We found it instructive to compare the results of different strategies, and it may be useful for investigators in general to use approaches as varied as strategies 1 or 2 and 5 or 6, with the choice between the 2 latter depending on the strength of substantive knowledge necessary to define “exchangeable” categories of exposures. If the results from such different approaches lead to similar inferences about any given exposure, this would provide confidence that the result is not an artifact of a specific analytic approach. For agents that manifest different results depending on the analytic strategy, it would raise an inferential dilemma; in such cases, we recommend reporting the discrepant results and being circumspect about the interpretation.

The question of which is the best strategy to disentangle the effects of multiple intercorrelated exposures may not have a single answer to fit all circumstances. Further theoretical and simulation work could provide insight on the relative advantages of different approaches under a wider range of circumstances and assumptions.

REFERENCES
1. Thomas DC. The problem of multiple inference in identifying point-source environmental hazards.

Environ Health Perspect . 1985;62:407–414.

2. Witte JS, Greenland S, Haile RW, Bird CL. Hierarchical regression analysis applied to a study of multiple dietary exposures and breast cancer.

Epidemiology . 1994;5:612–621.

3. Greenland S. Methods for epidemiologic analyses of multiple exposures: a review and comparative study of maximum-likelihood, preliminary-testing, and empirical-Bayes regression.

Stat Med . 1993;12:717–736.

4. Greenland S, Robins JM. Empirical-Bayes adjustments for multiple comparisons are sometimes useful.

Epidemiology . 1991;2:244–251.

5. Steenland K, Bray I, Greenland S, Boffetta P. Empirical Bayes adjustments for multiple results in hypothesis-generating or surveillance studies.

Cancer Epidemiol Biomarkers Prev . 2000;9:895–903.

6. MacLehose RF, Dunson DB, Herring AH, Hoppin JA. Bayesian methods for highly correlated exposure data.

Epidemiology . 2007;18:199–207.

7. Greenland S. When should epidemiologic regressions use random coefficients?

Biometrics . 2000;56:915–921.

8. Greenland S. Comment: cautions in the use of preliminary test estimators.

Stat Med . 1989;8:669–673.

9. Greenland S. Multiple comparisons and association selection in general epidemiology.

Int J Epidemiol . 2008;37:430–434.

10. Greenland S. Principles of multilevel modelling.

Int J Epidemiol . 2000;29:158–167.

11. Morris CN. Parametric empirical Bayes: theory and application.

J Am Stat Assoc . 1983;78:47–65.

12. Thomas DC, Siemiatycki J, Dewar RA, Robins J, Goldberg M, Armstrong BG. The problem of multiple inference in studies designed to generate hypotheses.

Am J Epidemiol . 1985;122:1080–1095.

13. Witte JS, Greenland S. Simulation study of hierarchical regression.

Stat Med . 1996;15:1161–1170.

14. Greenland S. Second-stage least squares versus penalized quasi-likelihood for fitting hierarchical models in epidemiologic analyses.

Stat Med . 1997;16:515–526.

15. Siemiatycki J, Wacholder S, Richardson L, Dewar RA, Gerin M. Discovering carcinogens in the occupational environment. Methods of data collection and analysis of a large case-referent monitoring system.

Scand J Work Environ Health . 1987;13:486–492.

16. Siemiatycki J. Discovering occupational carcinogens in population-based case-control studies: review of findings from an exposure-based approach and a methodologic comparison of alternative data collection strategies.

Recent Results Cancer Res . 1990;120:25–38.

17. Gerin M, Siemiatycki J, Kemper H, Begin D. Obtaining occupational exposure histories in epidemiologic case-control studies.

J Occup Med . 1985;27:420–426.

18. Siemiatycki J.

Risk Factors for Cancer in the Workplace . Boston, MA: CRC Press; 1991.

19. Leffondré K, Abrahamowicz M, Siemiatycki J, Rachet B. Modeling smoking history: a comparison of different approaches.

Am J Epidemiol . 2002;156:813–823.

20. Greenland S. A semi-Bayes approach to the analysis of correlated multiple associations, with an application to an occupational cancer-mortality study.

Stat Med . 1992;11:219–230.

21. De Roos AJ, Poole C, Teschke K, Olshan AF. An application of hierarchical regression in the investigation of multiple paternal occupational exposures and neuroblastoma in offspring.

Am J Ind Med . 2001;39:477–486.

22. Greenland S, Poole C. Empirical-Bayes and semi-Bayes approaches to occupational and environmental hazard surveillance.

Arch Environ Health . 1994;49:9–16.

23. Greenland S. Invited commentary: variable selection versus shrinkage in the control of multiple confounders.

Am J Epidemiol . 2008;167:523–529.

24. Siemiatycki J, Richardson L, Straif K, et al. Listing occupational carcinogens.

Environ Health Perspec . 2004;112:1447–1459.

25. Mickey RM, Greenland S. The impact of confounder selection criteria on effect estimation.

Am J Epidemiol . 1989;129:125–137.

26. Steenland K, Loomis D, Shy CM, Simonsen N. Review of occupational lung carcinogens.

Am J Ind Med . 1996;29:474–490.

27. Monson RR. Occupation. In: Schottenfeld D, Fraumeni JF Jr, eds.

Cancer Epidemiology and Prevention . New York: Oxford University Press; 1996;373–405.

28. Witte JS, Greenland S, Kim LL, Arab L. Multilevel modeling in epidemiology with GLIMMIX.

Epidemiology . 2000;11:684–688.

29. Greenland S, Morgenstern H. Confounding in health research.

Annu Rev Public Health . 2001;22:189–212.

30. Miettinen OS. Confounding and effect-modification.

Am J Epidemiol . 1974;100:350–353.

31. Robins JM. Data, design, and background knowledge in etiologic inference.

Epidemiology . 2001;12:313–320.

32. Robins JM, Greenland S. The role of model selection in causal inference from nonexperimental data.

Am J Epidemiol . 1986;123:392–402.

33. Greenland S, Robins JM. Identifiability, exchangeability, and epidemiological confounding.

Int J Epidemiol . 1986;15:413–419.

34. Kleinbaum DG, Kupper LL, Morgenstern H.

Epidemiologic Research: Principles and Quantitative Methods . Belmont, CA: Lifetime Learning Publications; 1982.

35. Rothman KJ.

Modern Epidemiology . Toronto, ON: Little, Brown and Company; 1986.

36. Rothman KJ, Greenland S.

Modern Epidemiology . 2nd ed. Philadelphia: Lippincott-Raven; 1998.

37. Quantin C, Abrahamowicz M, Moreau T, et al. Variation over time of the effects of prognostic factors in a population-based study of colon cancer: comparison of statistical models.

Am J Epidemiol . 1999;150:1188–1200.

38. Dancourt V, Quantin C, Abrahamowicz M, Binquet C, Alioum A, Faivre J. Modeling recurrence in colorectal cancer.

J Clin Epidemiol . 2004;57:243–251.

39. Thomas DC, Witte JS, Greenland S. Dissecting effects of complex mixtures: who's afraid of informative priors?

Epidemiology . 2007;18:186–190.

40. Lash TL. Heuristic thinking and inference from observational epidemiology.

Epidemiology . 2007;18:67–72.

41. Day NE, Byar DP, Green SB. Overadjustment in case-control studies.

Am J Epidemiol . 1980;112:696–706.

42. Greenland S. Bayesian perspectives for epidemiological research. II. Regression analysis.

Int J Epidemiol . 2007;36:195–202.