Sufficient-Cause Modeling with Matched Data Using SAS

Liao, Shu-Fen; Lee, Wen-Chung

doi: 10.1097/EDE.0b013e3182a705e6
Author Information

Research Center for Genes, Environment and Human Health, Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, Taipei, Taiwan,

The study is partly supported by grants from the National Science Council, Taiwan.

Supplemental digital content is available through direct URL citations in the HTML and PDF versions of this article ( This content is not peer-reviewed or copy-edited; it is the sole responsibility of the author.

Article Outline

To the Editor:

Rothman’s sufficient-cause model is a useful construct for disease causation.1 It provides a synthesis of multiple interacting risk factors, jointly and collectively.1,2 It also helps to evaluate the impact of public-health interventions.1–5 Sufficient-cause modeling for unmatched case-control data2,5 and person-time data3,4 is simple and can be implemented through the generalized linear model (GENMOD) procedure in SAS (SAS Institute, Cary, NC); this approach has been applied successfully in cardiovascular2,5 and cancer epidemiology.3,4 For matched data, extra programming is needed. Here we present simple SAS codes (eAppendix, illustrated with two examples: a matched case-control study6 and a survival dataset7 requiring a time-matched risk-set analysis.

The unit of analysis is the matching set either confounder-matched (for matched case-control data) or time-matched (for survival data).5,8 For the former, the model is


is the disease odds for individuals at the

th matching set who have a risk-factor profile of

For the latter, the model is


is the disease rate of individuals at time

who have a risk-factor profile of

The “intercepts” of the models, the

and the

, are treated as nuisance variables and will be eliminated in the model-fitting process (conditional likelihood for matched case-control data; partial likelihood for survival data).

Under the assumptions of no confounding, monotonicity, and independent competing causes, the β-coefficients of the models correspond directly to the completion-potential indices for the various classes of sufficient causes (one completion-potential index for one class of sufficient causes; the completion-potential index for the all-unknown class is 1.0 by definition).5 A small-scale simulation study (eAppendix, shows that the completion-potential estimates are approximately unbiased. With additional algebra, other sufficient-cause–related indices (such as the individual-based and the population-wide causal-pie weights) and the attributable-fraction indices (such as the population attributable fraction and the attributable fraction among the exposed) can all be calculated from these completion-potential indices (see eAppendix,, for the definitions of these indices).5 Confidence intervals (CIs) for all estimates are based on the bootstrap method.2,5,8

The first example is Leisure World Study of Endometrial Cancer,6 a 1:4 matched case-control study with 63 matching sets. After model fitting, the main effect of estrogen use has a β-coefficient (which is also the completion-potential value for the class of sufficient causes containing estrogen use) of 7.0 (95% CI = 2.7–18). This implies that this particular class of sufficient causes is seven times as likely to cause the disease as the all-unknown class.

The second example is the Bone Marrow Transplant Patients Study,7 which followed 137 subjects for adverse outcomes after transplant surgery (leukemia relapse or death). The mean follow-up duration is 782 days with a total of 83 observed failures. The model shows a main effect of the French-American-British disease classification grade and an interactive effect of cytomegalovirus infection and methotrexate use. The β-coefficients are approximately the same for the completion-potential index for the French-American-British class (1.4 [95% CI = 0.6–3.0]) and for the interactive class between cytomegalovirus infection and methotrexate use (1.6 [0.6–3.8]). However, the population-wide causal-pie weights for these two are quite different (23% vs. 14%) (Figure). Other sufficient-cause–related indices and the attributable-fraction indices for these two examples are presented in eAppendix (

These easy-to-use SAS codes for sufficient-cause modeling with matched case-control and survival data should facilitate the use of sufficient-cause modeling.

Shu-Fen Liao

Wen-Chung Lee

Research Center for Genes, Environment and Human Health, Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, Taipei, Taiwan,

Back to Top | Article Outline


1. Rothman KJ. Causes. Am J Epidemiol. 1976;104:587–592
2. Liao SF, Lee WC. Weighing the causal pies in case-control studies. Ann Epidemiol. 2010;20:568–573
3. Liao SF, Lee WC, Chen HC, Chuang LC, Pan MH, Chen CJ. Baseline human papillomavirus infection, high vaginal parity, and their interaction on cervical cancer risks after a follow-up of more than 10 years. Cancer Causes Control. 2012;23:703–708
4. Liao SF, Yang HI, Lee MH, Chen CJ, Lee WC. Fifteen-year population attributable fractions and causal pies of risk factors for newly developed hepatocellular carcinomas in 11,801 men in Taiwan. PLoS One. 2012;7:e34–779
5. Lee WC. Completion potentials of sufficient component causes. Epidemiology. 2012;23:446–453
6. Mack TM, Pike MC, Henderson BE, et al. Estrogens and endometrial cancer in a retirement community. N Engl J Med. 1976;294:1262–1267
7. Klein JP, Moeschberger ML Survival Analysis: Techniques for Censored and Truncated Data. 1997 New York Springer-Verlag
8. Langholz B, Richardson DB. Fitting general relative risk models for survival time and matched case-control analysis. Am J Epidemiol. 2010;171:377–383

Supplemental Digital Content

Back to Top | Article Outline
© 2013 by Lippincott Williams & Wilkins, Inc