Commentary: Averaged or Stratified Measures of Risk Profile Discrimination: Horses for Courses

Lindström, Sara; Kraft, Peter

doi: 10.1097/EDE.0b013e3182320edc

From the Program in Molecular and Genetic Epidemiology, Harvard School of Public Health, Boston, MA.

Correspondence: Peter Kraft, Harvard School of Public Health, 677 Huntington Ave., Boston, MA 02115. E-mail:

Article Outline

The list of biomarkers associated with disease risk is rapidly growing. Genome-wide association studies have identified hundreds of common genetic variants associated with common complex diseases. These associations give us clues about disease biology, but it may take years of painstaking basic research to fully understand the mechanisms behind them. In contrast, we can use these variants for personalized risk profiling right now. Similarly, it is unclear whether other novel “-omic” markers (proteomic, epigenomic, metagenomic, and so forth) are causal intermediates in the disease process or the consequences of latent disease, yet they could be used to predict disease risk.

This raises the question: should we use these novel risk markers? Do the benefits outweigh the costs? What value do these new markers add?

In this issue of Epidemiology, Kerr and Pepe1 discuss how to evaluate a new risk marker for disease, where a set of predictors already exists (c.f. the Gail model for breast cancer, or the Framingham risk score for cardiovascular disease). The authors focus on a particular measure of the discriminatory ability of a risk profile, the receiver operating characteristic (ROC) curve, and show that the ROC curve for a profile that jointly models the new marker and other covariates is distinct from the ROC curve for the new marker within strata of the other covariates. They also show that the ROC curve for the new marker can differ across strata even in the absence of interaction on the log-odds scale.

Kerr and Pepe's results highlight 3 important issues.

First, because the joint and stratified ROC curves are different, investigators should consider which is most appropriate for their research question. If the purpose is to develop a risk assessment tool for the general population, the joint ROC curve may be more appropriate. If the goal is to determine the incremental value of a new marker in particular subgroups, the stratified ROC curve may be better. For example, the change in the ROC curve from adding genetic information may be small across the whole population but large among young people—who may be a small but clinically important fraction of the population.

Second, investigators who are interested in the improvement in discrimination due to a new marker should examine how the ROC curve changes across strata, regardless of whether statistically significant interaction is found on the log-odds scale. For example, we have observed that genetic variants associated with breast cancer do not interact with age in a logistic regression model either individually or collectively, but nonetheless, a risk profile based on 17 SNPs has a higher area under the ROC curve for women under 62 years (0.64) than for women over 70 years (0.58).2 Similarly, although none of the known common breast cancer risk markers interacts with any of the risk factors in the Gail score,3–5 the area under the ROC curve for the genetic risk profile was higher among women in the lowest Gail score quartile (0.63) than among women in the highest quartile (0.57).

Third, the paper raises the important but often overlooked point that case-control matching can greatly complicate the interpretation of the area under the ROC curve.6 This is particularly relevant when using epidemiologic studies originally designed to identify causal factors or to provide efficient estimates of relative risks, such as nested case-control studies.

The important task of “evaluating the incremental value of a risk marker”—genetic or not—sounds deceptively simple, but requires great care. “Incremental value” can be defined in many ways.7 Kerr and Pepe show how complicated the interpretation of just one summary measure can be. Add to that the many different ways to summarize the discriminatory or predictive ability of a risk marker (Brier score, Net Reclassification Index, integrated loss, number of individuals screened to identify a case)—each of which has relative strengths and weaknesses in different contexts—and the complexities multiply.

Moreover, a given question—for example, whether genetic information should be used to personalize screening recommendations—may be viewed differently from different perspectives (eg, patient, provider, payer). The particular perspective may affect whether the benefits of the additional information outweigh the costs. The costs are not only in dollars, but also in the effort required to educate physicians and patients to understand and make informed decisions regarding these tests, the potential social stigma of the tests, privacy concerns, et cetera.

No single measure can capture the incremental value of a new marker in all contexts. Researchers should carefully choose a measure (or measures) of the marker's utility that is relevant to the marker's intended application (or applications)—they should choose the right horse for the course.

Back to Top | Article Outline


SARA LINDSTRÖM is a Research Scientist in the Program in Molecular and Genetic Epidemiology at Harvard School of Public Health. She has published papers on gene-environment interactions as well as large-scale population-based case-control studies in breast and prostate cancer. PETER KRAFT is an Associate Professor of Epidemiology and Biostatistics and Deputy Director of the Program in Molecular and Genetic Epidemiology at the Harvard School of Public Health. He has published papers on the utility of genetic risk prediction and statistical methods for the study of gene-environment interaction in population-based studies.

Back to Top | Article Outline


1. Kerr K, Pepe M. Joint modeling, covariate adjustment, and interaction: contrasting notions in risk prediction models and risk prediction performance. Epidemiology. 2011;22:805–812.
2. Kraft P, Aschard H, Chen, JB. Gene-gene and gene-environment interactions involving GWAS-identified loci unlikely to drastically improve breast cancer risk prediction. Genet Epidemiol. 2010;34:979. Meeting Abstract 223.
3. Campa D, Kaaks R, Le Marchand, et al. Interactions Between Genetic Variants and Breast Cancer Risk Factors in the Breast and Prostate Cancer Cohort Consortium. J Natl Cancer Inst. 2011;103:1252–1263.
4. Travis RC, Reeves GK, Green J, et al. Gene-environment interactions in 7610 women with breast cancer: prospective evidence from the Million Women Study. Lancet. 2010;375:2143–2151.
5. Milne RL, Gaudet MM, Spurdle AB, et al. Assessing interactions between the associations of common genetic susceptibility variants, reproductive history and body mass index with breast cancer risk in the breast cancer association consortium: a combined case-control study. Breast Cancer Res. 2010;12:R110.
6. Janes H, Pepe MS. Matching in studies of classification accuracy: implications for analysis, efficiency, and assessment of incremental value. Biometrics. 2008;64:1–9.
7. Janssens AC, Ioannidis JP, van Duijn CM, Little J, Khoury MJ; GRIPS Group. Strengthening the reporting of genetic risk prediction studies: the GRIPS statement. Eur J Epidemiol. 2011;26:255–259.
© 2011 Lippincott Williams & Wilkins, Inc.