Upon fitting the mixed model 3 to the example data and calculating the semi-Bayes estimates from model 4, estimates that were unstable and nonsensical became more precise and reasonable, whereas those that were relatively precise did not change much. For example, the maximum-likelihood estimate of the OR for eating 100 additional grams of white cabbage was 2.5 (95% CI = 0.75–8.4); in contrast, the semi-Bayes OR estimate from GLIMMIX was 1.2 (95% CI = 0.59–2.6) (Table 1). In contrast, the OR for eating 100 additional grams of allium vegetables changed little upon applying the multilevel model (Table 1). Setting the elements of τ to larger values (for example, 0.53, which corresponds to a 95% prior certainty that the residual relative risk lies in an eightfold range) gave less stable results; here, the semi-Bayes OR for eating 100 additional grams of white cabbage was 1.7 (95% CI = 0.64–4.4). Using the GLIMMIX default (that is, empirical-Bayes) led to the common τ2 being estimated as 0, which ignores any residual second-stage effects and results in overly precise estimates that are pulled too close together. 5
For comparison, we also fit the two-level model (model 1 plus model 2) to the diet and breast cancer data using our earlier weighted-least-squares method. 5–8,20 This method (third set of columns in Table 1) generally gave results almost identical to the penalized likelihood results from GLIMMIX. The similarity of these results, and their limited improvement over the conventional approach (Table 1), occurred because the mixed model 3 with δ = 0 fit the data reasonably well.
The Appendix provides the SAS IML code used to calculate ORs and 95% CIs from our GLIMMIX output. The complete program, a corresponding SAS macro, and instructions on how to obtain results using a multilevel model with GLIMMIX, are available at URL http://darwin.cwru.edu/∼witte/hm.html.
With the explication and code given here, epidemiologists can use GLIMMIX to analyze their own data with a multilevel model. Strengths of GLIMMIX include allowing for more than two stages and providing diagnostic statistics. A recent report 26 provides further evaluation of GLIMMIX and comparison with variance component software packages specifically written for multilevel modeling. 27–29 As with SAS GLIMMIX, however, use of these packages for epidemiologic analysis will require either writing special code or altering output to give the relative risk estimates and corresponding confidence intervals. 20 Multilevel modeling can also be undertaken with procedures available in the SAS IML and GAUSS languages. 5,6,20,30 These procedures provide standard epidemiologic output and are available from URL http://darwin.cwru.edu/∼witte/hm.html.
We thank the EURAMIC Study for the example diet and breast cancer data used here.
1. Malmstrom M, Sundquist J, Johansson SE. Neighborhood environment and self-reported health status: a multilevel analysis. Am J Public Health 1999; 89: 1181–1186.
2. Greenland S. Principles of multilevel modeling. Int J Epidemiol 2000; 29: 158–167.
3. Efron B, Morris C. Data analysis using Stein’s estimator and its generalizations. J Am Stat Assoc 1975; 70: 311–319.
4. Thomas DC, Siemiatycki J, Dewar R, Robins J, Goldberg M, Armstrong BG. The problem of multiple inference in studies designed to generate hypotheses. Am J Epidemiol 1985; 122: 1080–1095.
5. Greenland S. A semi-Bayes approach to the analysis of correlated multiple associations, with an application to an occupational cancer-mortality study. Stat Med 1992; 11: 219–230.
6. Greenland S. Methods for epidemiologic analyses of multiple exposures: a review and comparative study of maximum-likelihood, preliminary-testing, and empirical-Bayes regression. Stat Med 1993; 12: 717–736.
7. Witte JS, Greenland S, Bird CL, Haile RW. Hierarchical regression analysis applied to a study of multiple dietary exposures and breast cancer. Epidemiology 1994; 5: 612–621.
8. Witte JS, Greenland S. Simulation study of hierarchical regression. Stat Med 1996; 15: 1161–1170.
9. Greenland S. Second-stage least squares versus penalized quasi-likelihood for fitting hierarchical models in epidemiologic analyses. Stat Med 1997; 16: 515–526.
10. Witte JS. Genetic analysis with hierarchical models. Genet Epidemiol 1997; 14: 1137–1142.
11. Aragaki CC, Greenland S, Probst-Hensch N, Haile RW. Hierarchical modeling of gene-environment interactions: estimating NAT2* genotype-specific dietary effects on adenomatous polyps. Cancer Epidemiol Biomarkers Prev 1997; 6: 307–314.
12. Rothman K. No adjustments are needed for multiple comparisons. Epidemiology 1990; 1: 43–46.
13. Witte JS, Elston RC, Schork NJ. Re: Genetic dissection of complex traits. Nat Genet 1996; 12: 355–356.
14. Thompson JR. Invited commentary. Re: “Multiple comparisons and related issues in the interpretation of epidemiologic data.” Am J Epidemiol 1998; 147: 801–806.
15. Goodman SN. Multiple comparisons, explained. Am J Epidemiol 1998; 147: 807–812.
16. Savitz DA, Olshan AF. Describing data requires no adjustment for multiple comparisons: a reply from Savitz and Olshan. Am J Epidemiol 1998; 147: 813–814.
17. Thompson JR. Re: Describing data requires no adjustment for multiple comparisons. Am J Epidemiol 1998; 147: 815.
18. Thomas DC, Langholz B, Clayton D, Pitkaniemi J, Tuomilehto-Wolf E, Tuomilehto J. Empirical Bayes methods for testing associations with large numbers of candidate genes in the presence of environmental risk factors, with applications to HLA association in IDDM. Ann Med 1992; 24: 387–392.
19. Greenland S. When should epidemiologic regressions use random coefficients? Biometrics 2000 (in press).
20. Witte JS, Greenland S, Kim L-L. Software for hierarchical modeling of epidemiologic data. Epidemiology 1998; 9: 563–566.
21. Wolfinger R, O’Connell M. Generalized linear mixed models: a pseudo-likelihood approach. J Stat Comput Simul 1993; 48: 223–243.
22. Simonsen NR, Fernández Crehuet Navajas J, Martín Moreno JM, Strain JJ, Huttunen JK, Martin BC, Thamm M, Kardinaal AF, vant Veer P, Kok FJ, Kohlmeier L, van t’Veer P. Tissue stores of individual monounsaturated fatty acids and breast cancer: the EURAMIC study. European Community Multicenter Study on Antioxidants, Myocardial Infarction, and Breast Cancer. Am J Clin Nutr 1998; 68: 134–141.
23. Jewell NP. On the bias of commonly used measures of association for 2 × 2 tables. Biometrics 1986; 42: 351–358.
24. Greenland S. Small-sample bias and corrections for conditional maximum-likelihood odds-ratio estimators. Biostatistics 2000; 1: 113–122.
25. Greenland S, Schwartzbaum JA, Finkle WD. Small-sample and sparse-data problems in conditional logistic regression analysis. Am J Epidemiol 2000; 151: 531–539.
26. Zhou XH, Perkins AJ, Hui SL. Comparisons of software packages for generalized linear multilevel models. Am Stat 1999; 53: 282–290.
27. HLM, Version 4.0. Chicago: Scientific Software Inc., 1996.
28. MLn, Version 1.0a. London: Multilevel Models Project, 1996.
29. VARCL. Groningen, The Netherlands: ProGAMMA, 1996.
30. The GAUSS System, Version 3.2. Maple Valley, WA: Aptech Systems, Inc., 1996.
31. Fenwick GR, Heaney RK, Mullin WJ. Glucosinolates and their breakdown products in food and food plants. Crit Rev Food Sci Nutr 1983; 18: 123–201.
32. Chug-Ahuja JK, Holden JM, Forman MR, Mangels AR, Beecher GR, Lanza E. The development and application of a carotenoid database for fruits, vegetables, and selected multicomponent foods. J Am Diet Assoc 1993; 93: 318–323.
33. Haussler A, Rehm J, Nass E, Kohlmeier L. Data bases for nutritional epidemiology: The food code of the German Federal Republic (BLS). In: Kohlmaier L, ed. The Diet History Method. London: Smith-Gordon, 1991; 103–108.