Institutional members access full text with Ovid®

Share this article on:

00005650-201304000-0000400005650_2013_51_304_etzioni_understand_4miscellaneous< 45_0_3_0 >Medical CareCopyright © 2013 Wolters Kluwer Health, Inc. All rights reserved.Volume 51(4)April 2013p 304–306Response: Reading Between the Lines of Cancer Screening Trials: Using Modeling to Understand the Evidence[Point-Counterpoint]Etzioni, Ruth PhD; Gulati, Roman MSFred Hutchinson Cancer Research Center, Seattle WAThe authors declare no conflict of interest.Reprints: Ruth Etzioni, PhD, Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, 1100 Fairview Ave N, M2-B230, Seattle, WA 98109-1024. E-mail: .AbstractIn our article about limitations of basing screening policy on screening trials, we offered several examples of ways in which modeling, using data from large screening trials and population trends, provided insights that differed somewhat from those based only on empirical trial results. In this editorial, we take a step back and consider the general question of whether randomized screening trials provide the strongest evidence for clinical guidelines concerning population screening programs. We argue that randomized trials provide a process that is designed to protect against certain biases but that this process does not guarantee that inferences based on empirical results from screening trials will be unbiased. Appropriate quantitative methods are key to obtaining unbiased inferences from screening trials. We highlight several studies in the statistical literature demonstrating that conventional survival analyses of screening trials can be misleading and list a number of key questions concerning screening harms and benefits that cannot be answered without modeling. Although we acknowledge the centrality of screening trials in the policy process, we maintain that modeling constitutes a powerful tool for screening trial interpretation and screening policy development.The article by Melnikow et al1 brings into sharp focus the essence of the policy development process and the tug-of-war between randomized controlled trials (RCTs) and other sources of evidence, in this case, the use of models. Their opinions reflect the widespread sentiments of confidence in the ability of RCTs to eliminate bias and of distrust in modeling because of its complexity and frequent lack of transparency. These comments compel us to examine closely the issues of bias and complexity in screening studies and the roles of study design and analysis in achieving unbiased interpretations of the evidence.There is no question that the RCT paradigm represents a gold standard for evidence. But why? Because it provides a process that enables the interventions of interest to be allocated to subjects in a random, nonselective fashion. Thus, the RCT, by design, avoids one of the greatest threats to valid inference, namely selection bias. Further characteristics of the RCT process (eg, blinding subjects and/or investigators and intention-to-treat methods) are designed to reinforce the freedom of resulting inferences from selection and related biases. However, the RCT paradigm does not actually specify how those inferences are to be made and it does not provide a blueprint for the “correct” analytical model. Thus, the RCT paradigm only sets the stage for unbiased inferences; it does not guarantee them.The case of cancer screening provides a perfect example for how conventional analysis of a well-conducted RCT can yield a biased inference. The Health Insurance Plan breast cancer screening trial was a seminal RCT of mammography screening initiated in 1963.2 Beyond the extensive ramifications of this study for clinical practice, the trial stimulated a rich statistical methodological investigation regarding appropriate methods for analyzing cancer screening trials.3–5 A key outcome of this work was the finding that the standard Cox proportional hazards model typically used to model disease-specific survival outcomes among clinical trial participants is not valid in the screening trial setting because the hazards (or risks) of death in the 2 groups are not proportional. Thus, the hazard ratio (or the often cited mortality rate ratio) is a biased estimate of the relative reduction in the risk of disease-specific death associated with screening. As Hanley6 explains, there is invariably a delay from the start of the trial until the attainment of screening-induced mortality reductions. Analyses that merge the deaths in this early “no-reduction window” with later deaths attenuate estimates of screening benefit. He illustrates his point by examining how the mortality rate ratio in the European Randomized Study of Screening for Prostate Cancer (ERSPC) has changed with time since the beginning of the trial. Results indicate that after a delay of approximately 7 years, the prostate cancer mortality reductions are considerably greater than the 20% reduction reported by ERSPC investigators, reaching 67% (80% confidence interval 30%–89%) at the beginning of 12 years of follow-up.This simple example demonstrates the complexity of quantifying the benefits of a cancer screening test. The statistical literature has clearly shown that, even in the case of a well-designed screening trial, the standard analyses that are established in the treatment trials setting must be modified to achieve valid inferences about the relative mortality reduction induced by screening. Moreover, inferences about absolute mortality reductions are even more suspicious because of their clear dependence on the time horizon used to estimate them. Indeed, even if the relative mortality reduction is constant over time (ie, the proportional hazard assumption is met), the absolute mortality reduction (lives saved by screening) will continue to grow.7 Thus, screening trials conducted over a limited time horizon cannot provide unbiased information about absolute benefit in terms of the lives that will be saved by screening over a lifetime. Similarly, attempting to use observed incidence data from trials to estimate harms such as overdiagnosis invariably produces an inflated result.7 This is because the excess incidence in the screened group relative to the control group, typically used as a proxy for overdiagnosis, consists of a mixture of overdiagnosed cases and true early detections, but we cannot distinguish these on the basis of the observed data. Indeed, we cannot think of a setting where empirical data can be used to provide an unbiased estimate of overdiagnosis.It is not our purpose to question the importance of screening trials and their necessary place in the policy development process. Our point is that making valid inferences from screening trials and correctly interpreting the evidence generally requires going beyond the observed trial results. It is possible, and indeed even likely, that using models to do this will produce estimates of harms and benefits that differ substantially from the results observed in screening trials. For example, using both a simple back-of-the-envelope approach8 and a considerably more complex model,9 we projected that the ERSPC relative mortality reduction should translate into 6 lives saved per 1000 men screened in a population screening program beginning at an age of 50 or 55, rather than the 1 life saved based on the observed results.10 Similarly, we have estimated, using 2 different models of overdiagnosis, that roughly 25% of screen-detected cases are overdiagnosed in the US11,12 instead of the >50% based on excess incidence in the ERSPC trial after 9 years of follow-up.13 However, it is precisely because we have used models to go beyond the trials that our results are different. When we restrict the models to the trial protocol and follow-up, our projections closely match the observed relative and absolute mortality reductions after 11 years of follow-up.9 This step of validating a model against published findings is necessary to test its reliability.The issue of model reliability is critical, as no model can perfectly represent the biological complexity of the disease’s natural history and its interaction with a screening intervention. Moreover, there is no question that there are inadequate and biased models in the literature. Some modeling studies make indefensible assumptions, do not provide sufficient information about the assumptions made, or do not adequately validate their findings against published data. However, there is also a growing cadre of models that represent thoughtful abstractions of the biology, conduct mathematically coherent calibration to observed data, and provide careful and detailed documentation of clinically plausible assumptions.The assumptions made by models are considered to be their Achilles’ heel by Melnikow et al.1 However, even simple analyses that would not be considered “models” make assumptions. For example, using a mortality rate ratio to summarize the empirical reduction in the risk of prostate cancer death under screening effectively assumes that the relative risk is constant in the screening and control groups and approximates the hazard ratio in a Cox proportional hazards analysis of disease-specific survival.14 Further, statements that may appear intuitive often make implicit assumptions that are simply not acknowledged. For example, Melnikow and colleagues state that “competing causes of mortality in older men makes it progressively less likely that longer follow-up will demonstrate a large absolute reduction in disease-specific mortality.” Although it is true that competing deaths increase as men age, so do the number of fatal prostate cancers and the potential lives that could be saved by early detection.9 This statement therefore implicitly assumes that the growth in the number of fatal cancers that could be saved by screening is outweighed by the competing risk of other causes of death among men in their seventies. As another example, the statement that the “effect of crossover [in the PLCO] is to reduce the impact of the intervention but it cannot eliminate a benefit that is truly present” implicitly makes 2 assumptions. The first assumption is that if screening is beneficial when compared with no screening, then more screening will save more lives than less screening. In the case of the Prostate, Lung, Colorectal, and Ovarian (PLCO) cancer screening trial, “more screening” corresponded to screening every year and “less screening” corresponded to screening approximately every other year.15,16 However, several studies have indicated that any additional benefit of screening for prostate cancer annually versus every other year is likely to be marginal.9,17 The second assumption is that the trial itself was precise enough to detect a difference between more versus less screening. In fact, there were far fewer deaths than expected in the PLCO cancer screening trial. Using modeling we were able to show that the mortality results were noisy enough that the chance of a null or reversed result (excess mortality in the intervention group) was possible even if screening is beneficial.18 There is no question that making assumptions can be dangerous, but we believe that explicit, documented assumptions are far preferable to implicit, undocumented ones.In recent years, modeling has become more widely accepted as a part of the policy development process. The USPSTF has been perhaps the most influential of US policy panels in the movement towards greater acknowledgment of the utility of modeling in this setting. Indeed, the USPSTF has used models in developing its most recent recommendations for both breast and colorectal cancer screening.19,20 These models are not dissimilar to the ones we have developed and advocated in the case of the prostate screening trials. The critical question is how models should contribute to the evidence that will ultimately drive the policy decision.We agree that RCTs provide the best opportunity to obtain the strongest evidence for clinical guidelines. The problem with randomized screening trials is that this evidence is rarely packaged in a way that permits proper interpretation. With screening trials there is almost always a further step required to actually unlock the evidence that they are poised to provide. Modeling by itself cannot create evidence but modeling has the potential to unlock the rich repository of evidence in screening trial data. Thus, using models, we are able to conclude that a reverse result was indeed possible in the PLCO cancer screening trial, even in the presence of a clinically significant screening benefit. Using models, we can interrogate screening trial incidence patterns to learn about the lead time (time by which screening advances diagnosis) and estimate overdiagnosis, an inherently unobservable quantity. Moreover, using models, we can project absolute lives saved implied by trial mortality results over a time horizon that matches the policy perspective. If we insist on taking randomized screening trials at face value, then we run the risks of, at best, inadequately using a prime resource for information about screening outcomes and, at worst, making incorrect inferences about screening harm and benefit. Well-developed and documented models can complement, rather than contravene, empirical screening trial results and provide a more complete assessment of the net benefits of cancer screening.REFERENCES1. Melnikow J, LeFevre ML, Wilt TJ, et al. Randomized trials provide the strongest evidence for clinical guidelines: The US Preventive Services Task Force and Prostate Cancer Screening. Med Care. 2013;51:301–303 [CrossRef] [Full Text] [Medline Link] [Context Link]2. Shapiro S. Evidence on screening for breast cancer from a randomized trial. Cancer. 1977;39:2772–2782 [CrossRef] [Medline Link] [Context Link]3. Aron JL, Prorok PC. An analysis of the mortality effect in a breast cancer screening study. Int J Epidemiol. 1986;15:36–43 [CrossRef] [Medline Link] [Context Link]4. Zucker DM, Lakatos E. Weighted log rank type statistics for comparing survival curves when there is a time lag in the effectiveness of treatment. Biometrika. 1990;77:853–864 [CrossRef] [Medline Link] [Context Link]5. Self SG, Etzioni R. A likelihood ratio test for cancer screening trials. Biometrics. 1995;51:44–50 [CrossRef] [Medline Link] [Context Link]6. Hanley JA. Measuring mortality reductions in cancer screening trials. Epidemiol Rev. 2011;33:36–45 [CrossRef] [Full Text] [Medline Link] [Context Link]7. Gulati R, Mariotto AB, Chen S, et al. Long-term projections of the harm-benefit trade-off in prostate cancer screening are more favorable than previous short-term estimates. J Clin Epidemiol. 2011;64:1412–1417 [CrossRef] [Medline Link] [Context Link]8. Etzioni R, Gulati R, Cooperberg MR, et al. Limitations of Basing Screening Policies on Screening Trials: The US Preventive Services Task Force and Prostate Cancer Screening. Med Care. 2013;51:295–300 [CrossRef] [Full Text] [Medline Link] [Context Link]9. Gulati R, Gore JL, Etzioni R. Comparative effectiveness of alternative PSA-based screening strategies. Ann Intern Med. 2013;158:145–153 [CrossRef] [Full Text] [Medline Link] [Context Link]10. Moyer VA.On behalf of the USPSTF. . Screening for prostate cancer: U.S. Preventive Services Task Force Recommendation Statement. Ann Intern Med. 2012;157:120–134 [CrossRef] [Full Text] [Medline Link] [Context Link]11. Gulati R, Wever EM, Tsodikov A, et al. What if I don’t treat my PSA-detected prostate cancer? Answers from three natural history models. Cancer Epidemiol Biomarkers Prev. 2011;20:740–750 [CrossRef] [Medline Link] [Context Link]12. Telesca D, Etzioni R, Gulati R. Estimating lead time and overdiagnosis associated with PSA screening from prostate cancer incidence trends. Biometrics. 2008;64:10–19 [CrossRef] [Medline Link] [Context Link]13. Schröder FH, Hugosson J, Roobol MJ, et al. Screening and prostate-cancer mortality in a randomized European study. N Engl J Med. 2009;360:1320–1328 [CrossRef] [Full Text] [Medline Link] [Context Link]14. Whitehead J. Fitting Cox’s regression model to survival data using GLIM. J R Stat Soc Ser C Appl Stat. 1980;29:268–275 [Context Link]15. Pinsky PF, Black A, Kramer BS, et al. Assessing contamination and compliance in the prostate component of the Prostate, Lung, Colorectal, and Ovarian (PLCO) cancer screening trial. Clin Trials. 2010;7:303–311 [Medline Link] [Context Link]16. Berg CD. The Prostate, Lung, Colorectal and Ovarian cancer screening trial: the prostate cancer screening results in context. Acta Oncol. 2011;50(suppl 1):12–17 [CrossRef] [Medline Link] [Context Link]17. Ross KS, Carter HB, Pearson JD, et al. Comparative efficiency of prostate-specific antigen screening strategies for prostate cancer detection. JAMA. 2000;284:1399–1405 [CrossRef] [Full Text] [Medline Link] [Context Link]18. Gulati R, Tsodikov A, Wever EM, et al. The impact of PLCO control arm contamination on perceived PSA screening efficacy. Cancer Causes Control. 2012;23:827–835 [CrossRef] [Medline Link] [Context Link]19. Mandelblatt JS, Cronin KA, Bailey S, et al. Effects of mammography screening under different screening schedules: model estimates of potential benefits and harms. Ann Intern Med. 2009;151:738–747 [CrossRef] [Full Text] [Medline Link] [Context Link]20. Zauber AG, Lansdorp-Vogelaar I, Knudsen AB, et al. Evaluating test strategies for colorectal cancer screening: a decision analysis for the U.S. Preventive Services Task Force. Ann Intern Med. 2008;149:659–669 [CrossRef] [Full Text] [Medline Link] [Context Link] mass screening; randomized controlled trials; policy development; simulation|00005650-201304000-00004#xpointer(id(R1-4))|11065213||ovftdb|00005650-201304000-00003SL0000565020135130111065213P24[CrossRef]|00005650-201304000-00004#xpointer(id(R1-4))|11065404||ovftdb|00005650-201304000-00003SL0000565020135130111065404P24[Full Text]|00005650-201304000-00004#xpointer(id(R1-4))|11065405||ovftdb|00005650-201304000-00003SL0000565020135130111065405P24[Medline Link]|00005650-201304000-00004#xpointer(id(R2-4))|11065213||ovftdb|SL00002808197739277211065213P25[CrossRef]|00005650-201304000-00004#xpointer(id(R2-4))|11065405||ovftdb|SL00002808197739277211065405P25[Medline Link]|00005650-201304000-00004#xpointer(id(R3-4))|11065213||ovftdb|SL000043451986153611065213P26[CrossRef]|00005650-201304000-00004#xpointer(id(R3-4))|11065405||ovftdb|SL000043451986153611065405P26[Medline Link]|00005650-201304000-00004#xpointer(id(R4-4))|11065213||ovftdb|SL0000172819907785311065213P27[CrossRef]|00005650-201304000-00004#xpointer(id(R4-4))|11065405||ovftdb|SL0000172819907785311065405P27[Medline Link]|00005650-201304000-00004#xpointer(id(R5-4))|11065213||ovftdb|SL000017261995514411065213P28[CrossRef]|00005650-201304000-00004#xpointer(id(R5-4))|11065405||ovftdb|SL000017261995514411065405P28[Medline Link]|00005650-201304000-00004#xpointer(id(R6-4))|11065213||ovftdb|00003690-201106000-00004SL000036902011333611065213P29[CrossRef]|00005650-201304000-00004#xpointer(id(R6-4))|11065404||ovftdb|00003690-201106000-00004SL000036902011333611065404P29[Full Text]|00005650-201304000-00004#xpointer(id(R6-4))|11065405||ovftdb|00003690-201106000-00004SL000036902011333611065405P29[Medline Link]|00005650-201304000-00004#xpointer(id(R7-4))|11065213||ovftdb|SL00005077201164141211065213P30[CrossRef]|00005650-201304000-00004#xpointer(id(R7-4))|11065405||ovftdb|SL00005077201164141211065405P30[Medline Link]|00005650-201304000-00004#xpointer(id(R8-4))|11065213||ovftdb|00005650-201304000-00002SL0000565020135129511065213P31[CrossRef]|00005650-201304000-00004#xpointer(id(R8-4))|11065404||ovftdb|00005650-201304000-00002SL0000565020135129511065404P31[Full Text]|00005650-201304000-00004#xpointer(id(R8-4))|11065405||ovftdb|00005650-201304000-00002SL0000565020135129511065405P31[Medline Link]|00005650-201304000-00004#xpointer(id(R9-4))|11065213||ovftdb|00000605-201302050-00003SL00000605201315814511065213P32[CrossRef]|00005650-201304000-00004#xpointer(id(R9-4))|11065404||ovftdb|00000605-201302050-00003SL00000605201315814511065404P32[Full Text]|00005650-201304000-00004#xpointer(id(R9-4))|11065405||ovftdb|00000605-201302050-00003SL00000605201315814511065405P32[Medline Link]|00005650-201304000-00004#xpointer(id(R10-4))|11065213||ovftdb|00000605-201207170-00010SL00000605201215712011065213P33[CrossRef]|00005650-201304000-00004#xpointer(id(R10-4))|11065404||ovftdb|00000605-201207170-00010SL00000605201215712011065404P33[Full Text]|00005650-201304000-00004#xpointer(id(R10-4))|11065405||ovftdb|00000605-201207170-00010SL00000605201215712011065405P33[Medline Link]|00005650-201304000-00004#xpointer(id(R11-4))|11065213||ovftdb|SL0000848920112074011065213P34[CrossRef]|00005650-201304000-00004#xpointer(id(R11-4))|11065405||ovftdb|SL0000848920112074011065405P34[Medline Link]|00005650-201304000-00004#xpointer(id(R12-4))|11065213||ovftdb|SL000017262008641011065213P35[CrossRef]|00005650-201304000-00004#xpointer(id(R12-4))|11065405||ovftdb|SL000017262008641011065405P35[Medline Link]|00005650-201304000-00004#xpointer(id(R13-4))|11065213||ovftdb|00006024-200903260-00012SL000060242009360132011065213P36[CrossRef]|00005650-201304000-00004#xpointer(id(R13-4))|11065404||ovftdb|00006024-200903260-00012SL000060242009360132011065404P36[Full Text]|00005650-201304000-00004#xpointer(id(R13-4))|11065405||ovftdb|00006024-200903260-00012SL000060242009360132011065405P36[Medline Link]|00005650-201304000-00004#xpointer(id(R15-4))|11065405||ovftdb|SL007287622010730311065405P38[Medline Link]|00005650-201304000-00004#xpointer(id(R16-4))|11065213||ovftdb|SL000001452011501211065213P39[CrossRef]|00005650-201304000-00004#xpointer(id(R16-4))|11065405||ovftdb|SL000001452011501211065405P39[Medline Link]|00005650-201304000-00004#xpointer(id(R17-4))|11065213||ovftdb|00005407-200009200-00040SL000054072000284139911065213P40[CrossRef]|00005650-201304000-00004#xpointer(id(R17-4))|11065404||ovftdb|00005407-200009200-00040SL000054072000284139911065404P40[Full Text]|00005650-201304000-00004#xpointer(id(R17-4))|11065405||ovftdb|00005407-200009200-00040SL000054072000284139911065405P40[Medline Link]|00005650-201304000-00004#xpointer(id(R18-4))|11065213||ovftdb|SL0000172920122382711065213P41[CrossRef]|00005650-201304000-00004#xpointer(id(R18-4))|11065405||ovftdb|SL0000172920122382711065405P41[Medline Link]|00005650-201304000-00004#xpointer(id(R19-4))|11065213||ovftdb|00000605-200911170-00010SL00000605200915173811065213P42[CrossRef]|00005650-201304000-00004#xpointer(id(R19-4))|11065404||ovftdb|00000605-200911170-00010SL00000605200915173811065404P42[Full Text]|00005650-201304000-00004#xpointer(id(R19-4))|11065405||ovftdb|00000605-200911170-00010SL00000605200915173811065405P42[Medline Link]|00005650-201304000-00004#xpointer(id(R20-4))|11065213||ovftdb|00000605-200811040-00008SL00000605200814965911065213P43[CrossRef]|00005650-201304000-00004#xpointer(id(R20-4))|11065404||ovftdb|00000605-200811040-00008SL00000605200814965911065404P43[Full Text]|00005650-201304000-00004#xpointer(id(R20-4))|11065405||ovftdb|00000605-200811040-00008SL00000605200814965911065405P43[Medline Link]18838717Response: Reading Between the Lines of Cancer Screening Trials: Using Modeling to Understand the EvidenceEtzioni, Ruth PhD; Gulati, Roman MSPoint-Counterpoint451