**ArticlePlus**

Click on the links below to access all the ArticlePlus for this article.

Please note that ArticlePlus files may launch a viewer application outside of your web browser.

Genetic aspects of the etiology and prognosis of complex diseases are increasingly addressed in epidemiologic and clinical studies. Epidemiologic studies on direct genetic associations target polymorphisms that are putative causal variants. It has become possible to conduct surveys of the genome by screening large numbers of single nucleotide polymorphisms (SNPs), thus providing investigators new opportunities to look for associations in whole-genome data with onset of complex diseases.

A common feature of modern studies incorporating genetic data is the vast number of statistical analyses. The problems related to multiple comparisons are widely known.^{1} Several methods for handling multiple comparisons are available. A simple method is the Bonferroni adjustment, where the number of comparisons is used to adjust the significance level.^{1–3} The Bonferroni method can also be applied in the context of effect estimation by adjusting the conventionally calculated 95% confidence intervals (CIs) around the estimates of an effect measure, such as the odds ratio (OR) or the hazard ratio (HR). When more than a few comparisons are made, the Bonferroni adjustment leads to wider CIs with the risk for false-negative findings increasing dramatically.^{3}

Arguments advocating estimation of effect size rather than hypothesis testing are convincing in many epidemiologic and clinical study settings.^{4,5} The *P* value used in hypothesis-testing situations quantifies the discrepancy between observed data and the null hypothesis of no effect (*H0*), as the probability of results being as discrepant or more so, given *H0*. The *P* value is inherently confounded information—a mix of information about the effect size and the effective sample size.^{6} Nevertheless, the hypothesis-testing approach is commonly used when carrying out a large number of statistical analyses for evaluating the effects of each candidate genetic marker. To help investigators protect themselves from over-interpreting statistically significant findings that are not likely to signify a true effect—a problem connected to multiple comparisons—Wacholder et al^{7} have proposed consideration of the false-positive report probability (FPRP), ie, the probability of no true effect (of a specified size) given a statistically significant finding. The FPRP depends not only on the statistical significance level but also on both the prior probability that the effect is real and the statistical power of the test. The approach of Wacholder and colleagues simplifies the analysis by assuming a simple binary choice between the null hypothesis of no effect (*H0*) and the alternative hypothesis of an effect of a specified size (*H1*). The prior alternative effect size in FPRP should be fixed a priori regardless the sample size, rather than being varied to reach a better power. Consideration of a false-negative report probability (FNRP) could also be part of the strategy.^{1} Figure 1 illustrates the idea behind the false-positive and false-negative report probability. Naturally, by increasing the prior probability that the effect is real, the FPRP decreases while the FNRP increases.

Wacholder et al advocate the use of FPRP for a priori determination of “noteworthy” reporting thresholds; they carefully note that this value is the minimum preset level at which someone would call the observed result noteworthy. Wacholder et al describe how to perform FPRP calculations given data from a study.^{7} Such poststudy calculations are performed by substituting the observed *P* value in place of the significance level, α, and calculating the statistical power based on the precision of the obtained effect estimate. Poststudy FPRPs have been estimated, for example, in a study of polymorphisms and lung cancer risk.^{8} Wakefield^{9} thoroughly discusses the FPRP approach in a statistical context and notes that the FPRP-value based on the observed *P* value ignores information by conditioning on the observed (test-based) tail area.

We propose an estimation-based approach for selecting the most influential genetic markers, given data from a study with a large number of candidate markers. We consider situations where the task is to carry out selection of promising genetic markers for further study. The effect size is an important criterion to be considered when screening for potential false-positives; it gives an idea of the quantitative importance of the association (Bradford Hill's “strength-of-association” criterion).

The estimation-based approach relies on the assumption that the varying true effects of the candidate markers cannot be graded a priori (ie, the exchangeability assumption). First, the effects of each candidate marker, together with their 95% CIs, are estimated using a conventional statistical model incorporating a single candidate marker and, possibly, some other influential variables. Second, we use a semi-Bayes method to adjust the conventional effect estimates. Such adjustments pull outlying effect estimates towards an overall average, and lead to more narrow 95% CIs than with the conventional estimation method. Third, we calculate, for each candidate marker, a probability of marked effect; ie, the effect above or below given effect sizes considered to have prioritized implications. In Bayesian analyses—by contrast to frequentist analyses—the concept of probability of marked effect is accepted.^{10} Those probabilities can be used to select the most influential genetic markers, since a high probability indicates an influential marker.

Investigators aim at selecting a limited number of promising candidate markers—the top candidates. The proposed estimation-based approach can yield a notably different list of top candidates than the poststudy FPRP approach, as we demonstrate by using data from a clinical study of the potential prognostic impact of cytogenetic markers (chromosomal aberrations) in soft tissue sarcomas.

We discuss the potential of our estimation-based approach for genome-wide association studies.

### Demonstration of Estimation-Based Approach

#### A Study of Soft Tissue Sarcoma

The study, which is described in detail elsewhere,^{11} included 151 patients with high-grade (grade 2 or 3) sarcomas. The present data analyses included 132 of those patients; 19 patients were excluded because metastases were observed at diagnosis or information on tumor size was missing. During follow-up after diagnosis, 62 patients developed metastases (median follow-up time was 12 months; range 2–80 months). For each patient, the chromosomal regions affected by breaks, gains, and losses, as defined by International System for Human Cytogenetic Nomenclature,^{12} were recorded. Hence, 258 cytogenetic markers (all binary) were recorded for each patient. There was no prior knowledge regarding the prognostic effects of each marker.

Cox regression modeling was applied to estimate the effect (HR) of each cytogenetic marker on metastasis development; the known prognostic variables malignancy grade (2 or 3) and tumor size (< 5 cm or ≥5) were also included in the models. We were not able to estimate the HR for 14 cytogenetic markers due to low frequency. Moreover, 29 markers were combined with other markers, when always present together. Hence, 215 candidate cytogenetic markers were considered. The conventional HR estimates for each candidate marker (Fig. 2) indicated 17 statistically significant markers (conventional *P* < 0.05; Wald's test^{10}). The results for 14 of those markers have been presented previously (Mertens et al,^{11} Table 3). Here, we also considered markers that were involved in fewer than 5 patients with metastasis and, therefore, 3 additional statistically significant markers were obtained (breakpoints in 18p1, 18q1, and Xp1).

#### Semi-Bayes Adjustments of Conventional Effect Estimates

In the Bayesian approach that we consider here, the basic idea is that an over- or underestimation of the effect of a candidate marker is suspected to occur when the conventional point estimate of the log HR is far away from other estimates, and unstable (ie, with large standard error, yielding wide 95% CI). Consequently, the variance of the distribution of log HR estimates across the 215 candidate markers (V) is likely to overestimate the variance of the true individual log HRs (VT). In an empirical-Bayes approach, the goal is to estimate VT by “correcting” the overestimation of the total variance (V), which is approximately done by subtracting the mean of the variances of the individual log HR estimates (VM) from V, that is, VT ≅ V – VM (see Refs. ^{13 and 14}). The 2 quantities V and VM need to be estimated by an iterative computational procedure^{13,15} (presented in the Technical Appendix, available with the online version of this article). However, when the proportion of true (marked) effects is low, the empirical-Bayes approach can produce an estimate of VT equal to zero, implying extreme stabilization of the individual effect estimates around an overall average effect, which is not a useful result. This was seen with the present data set. We therefore used a semi-Bayes approach by assuming VT >0. A specified value on VT can be viewed as a lower limit of the variability of the log HRs and should not exceed the estimate of V. Given VT, we can compute the semi-Bayes adjusted HR estimates, with 95% CIs, for the candidate cytogenetic markers (see Technical Appendix).

Semi-Bayes adjustments of the conventional HR estimates for each candidate cytogenetic marker were performed by setting VT to 0.10 and 0.15, respectively (see Discussion). As expected, the semi-Bayes adjustments pulled outlying HR estimates towards an overall average of the effect estimates and led to more narrow 95% CIs (Fig. 3; cf. Fig. 2). Estimates for candidate markers with relatively wide unadjusted 95% CIs were pulled more towards the average effect than estimates for markers with relatively narrow CIs. The semi-Bayes method with VT = 0.10 produced an estimate of V = 0.18 and yielded pronounced stabilization of the HRs around the overall average effect, log HR = 0.22 (HR = 1.25). The overall average effect is a weighted average of the log HR estimates with weights equal to the inverse variances of the individual adjusted estimates (see Technical Appendix). With VT = 0.15 instead of VT = 0.10, we obtained an estimate of V = 0.19; and somewhat less marked stabilization around the overall average effect (HR = 1.24).

#### Probabilities of Marked Effect

The clinical variables grade (2 or 3) and tumor size (≥5 or <5 cm) are established prognostic factors in soft tissue sarcomas, expected to have a prognostic effect corresponding to a 2- to 3-fold increase of the hazard of metastasis development.^{16} The study was aimed at evaluating possible additional prognostic impact of each candidate cytogenetic marker. Hence, HR >2.0 as well as HR <0.5 were considered as clinically relevant effects; here, referred to as marked effects (a common terminology for clinical and epidemiologic studies). The calculated probabilities of marked effect for each candidate marker (see Technical Appendix) became lower after semi-Bayes adjustments with VT = 0.10, compared with VT = 0.15, since adjustments with VT = 0.10 gave more pronounced stabilization of the effect estimates (Fig. 4). The rankings of the probabilities of marked effect across the 215 candidate cytogenetic markers based on VT = 0.10 and 0.15, respectively, were highly correlated (Spearman's correlation coefficient = 0.99).

#### Comparison of Estimation-Based and Poststudy FPRP Approaches

The conventional Cox regression analyses of each candidate marker yielded 29 markers with *P* < 0.10. Among those markers, the probabilities of marked effect varied between 0.09 and 0.43 (VT = 0.10) and 0.13 and 0.59 (VT = 0.15) (Fig. 5). Depending on whether the effect estimates were semi-Bayes adjusted with VT = 0.10 or 0.15, the rank correlation coefficient (Spearman) between the probabilities of marked effect and the *P*-values differed little (−0.86 for VT = 0.10; −0.83 for VT = 0.15; Fig. 5). For the other candidate markers with *P* ≥ 0.10, we obtained low probabilities of marked effect (maximum 0.12 for VT = 0.10; 0.19 for VT = 0.15).

The poststudy false-positive report probabilities approach was also considered. For each candidate cytogenetic marker, we assumed a prior probability of the alternative hypothesis *H1*: HR = 2.0 equal to 0.05. The poststudy FPRPs and the *P*-values across the 17 statistically significant (*P* < 0.05) cytogenetic markers are shown in Figure 6; Spearman's rank correlation coefficient equals 0.68. Changing the prior probability π for all hypotheses does not change the rank of a set of FPRPs. The poststudy FPRP-values and the probabilities of marked effect rank-correlates relatively weakly across the 17 significant markers (−0.35 for VT = 0.10 and −0.32 for VT = 0.15). We point out that, for 2 candidate markers with equal conventional *P*-values, the poststudy false-positive report probabilities and probability of marked effects can imply different support: the poststudy FPRP approach (provided equal prior probability π) gives more support for the candidate with the smallest standard error (highest power), whereas the estimation-based approach may yield less support for that candidate if its point estimate of the effect size is lower (Fig. 6; the candidate markers with *P* = 0.004 as well as *P* = 0.017).

Investigators aim at selecting a limited number of promising candidate markers—the top candidates. The estimation-based and poststudy FPRP approaches can yield notably different lists of top candidates. The estimation-based approach may even include a nonsignificant candidate marker, with *P* value between 0.05 and 0.10, depending of the number of top candidates selected (Fig. 5). On the other hand, by using the poststudy FPRP approach, the top candidates are commonly selected among the statistically significant candidate markers.

If the standard errors of the individual estimates were more similar, then (i) the orders of conventional and semi-Bayes adjusted estimates, respectively, would be more similar as well—because the relative adjustment of a single estimate depends on its precision and (ii) the probabilities of marked effect across the candidate markers would be more closely rank-correlated with the conventional *P*-values—because the *P*-values would be less confounded by the standard errors and, therefore, dominantly influenced by the effect sizes. Also, the FPRP values would then be more closely rank-correlated with the conventional *P*-values, due to the fact that the poststudy power values (1 − β) would be more similar.

## DISCUSSION

Many attempts are being made to compare whole-genome data with disease/clinical outcome. With the advent and refinement of array-based high throughput molecular genetic approaches, the number of potential genetic variables that can be studied has increased dramatically. Screenings of hundreds of thousands of SNPs may accelerate our ability to identify heredity factors with impact on complex traits and diseases.^{17} However, reliable statistical tools are needed to handle the problem of multiple comparisons. Our estimation-based approach is applicable in principle to data from genome-wide association studies. We address relevant aspects and further developments for such applications.

When using the FPRP strategy for selecting the “noteworthy” genetic markers, the prior probability of a real effect is crucial for the interpretation of the results. The prior probability of, for example, a SNP being associated with a certain disorder could be estimated from results of previous studies, from biochemical and molecular knowledge that back up the function of a SNP (eg, whether or not it changes enzyme activity or might regulate the expression of the protein). Still, no such information is available for the majority of SNPs. In particular, there is limited knowledge about sequences regulating gene expression and mRNA stability, or that are involved in epigenetic mechanisms (eg, methylation of the cytosines). This limited functional data makes for uncertainty in assigning a prior probability of association to a SNP or other type of genetic marker.^{18}

In genome-wide association studies, ORs of 1.3 and above are expected to be accessible effect sizes for susceptibility loci.^{17} A multiplicative risk model for estimating the effects of each allele can be justified.^{19} A potentially sensitive issue in the estimation-based approach proposed here is the choice of VT—the variance between the individual candidates' true log ORs. The first option is to use an empirical-Bayes approach, where VT is estimated from the observed data.^{13–15,20} However, when analyzing whole-genome data with a vast number of candidate genetic variants, the empirical-Bayes estimation-based approach tends to degenerate in the situation with a low fraction of true positive associations and relatively large uncertainty in the individual effect estimates. No variability in the individual effect estimates remains after the empirical-Bayes adjustment of the conventional effect estimates. A semi-Bayes approach that imposes a lower limit of the true effect variability across the candidate variants can then be considered.^{13} A specified value on VT should not exceed the estimate of the variance of the distribution of log OR estimates across the candidate genetic markers (V). Investigators should reflect upon whether their choices of VT are reasonable. An underlying assumption is that the individual true log ORs are roughly normally distributed with variance VT, around an overall average effect (see Technical Appendix). It is straightforward to calculate the proportion of candidate markers that can be expected (a priori) to have a X-fold or more pronounced effect on outcome (harmful or preventive), provided a null overall average effect, for given values on X and VT (Table). For genome-wide association studies, VTs much lower than 0.10 are realistic.

Another possibility is to use a full Bayesian approach, allowing for the uncertainty in VT; however, more complex computations are then required. Analogously, a full Bayesian FPRP approach could be considered by allowing for the uncertainty in the prior probability π.

One key advantage of the FPRP approach is that it allows for different decision criteria for hypotheses that have different priors; this makes the exchangeability assumption unnecessary, in contrast to our approach. The exchangeability assumption is reasonable, considering the study used for demonstration, as there was no prior knowledge regarding the prognostic effects of each cytogenetic marker. It may also be a reasonable assumption in the early stage of this era of genome-wide association studies. As data accumulate, the exchangeability assumption would need to be modified: Which reasonable a priori assumptions can be used to meet better prior knowledge about effects of candidate SNPs (including combinations of SNPs)? This issue is of central importance for further developments of the estimation approach. Lönnstedt and Britton^{21} apply Bayesian models for analyzing microarray (gene expression) data. They consider a prior model where a small proportion *p* of the genetic variants is assumed to be important, and with effects (harmful or preventive) varying around a normal distribution with mean zero and a relatively large variance. The other variants (proportion 1 − *p*) vary around a normal distribution with mean zero and a relatively small variance. Modifications of the prior exchangeability assumption used in our approach, in line with their prior models, will be of interest for future applications of our approach in genome-wide association studies. A problem with applying our approach using the exchangeability assumption—without taking into account the violation of that assumption according to the modification above—is that rare variants having true important effects might be adjusted markedly towards the null; this leads to over-adjustment if the conventional effect estimate of an important variant is relatively uncertain. This will produce false-negative results.

The underlying assumption of no systemic biases in the conventional effect estimates (see Technical Appendix) might raise concern. Genome-wide association studies are unlikely to be completely unbiased owing to biased sampling, technical errors, etc. Of course, the bias problem also affects test-based approaches such as the FPRP.

Another concern is the definition of marked effect. We believe that per-allele OR >1.1 up to 1.3 (or, correspondingly, OR <0.91 down to 0.77) would be appropriate “marked effects” in genome-wide association studies. Based on our empirical data, we elaborated the influence of the choice of marked effect. Figure 7 shows the probabilities of HR above 1.5 or below 0.67 as well as above 2.0 or below 0.50 (previously presented in Fig. 5) derived from the semi-Bayes adjusted effect estimates with VT = 0.10 (results for the 29 candidate markers with conventional *P* < 0.10). By lowering the limit for marked effect, the rank-correlation with the *P*-values increased. We point out that investigators who want to explore alternative effect sizes by using the FPRP approach (where the prior probability [π] relates to an effect of specified size) need to reconsider π as well as power.

As already mentioned, our proposed approach is suited for performing an initial selection of promising genetic markers among a large number of candidates with limited or no prior functional information. Further analyses and studies of the selected genetic markers are, of course, required. More advanced empirical- and semi-Bayes approaches, taking into account possible correlation between the effect estimates, can be developed for further multivariate analyses.^{14} Only a limited number of markers can then be considered, due to the limitations of statistical power. For that purpose, we believe that the ranking of the probabilities of marked effect will be of primary concern when selecting promising markers. The initially estimated effects for selected genetic markers can be weakened or strengthened by further multivariate modeling, where gene-gene interactions may be addressed. On the other hand, when selecting candidate markers for additional studies, investigators may be more interested in the actual probabilities of marked effect.

Fully Bayesian approaches to mining genetic associations that scale to the size of a genome-wide data set have been proposed.^{22} However, such analyses are very complex, and it is difficult to incorporate prior knowledge to reduce the formidable number of possible association structures. We advocate an initial selection of promising markers as a way to address more complex association structures.

## ACKNOWLEDGMENTS

This work was partly conducted within the EU Network of Excellence ECNIS (www.ECNIS.org). We thank Maurizio Manuguerra for valuable comments.

## REFERENCES

1. Thomas DC, Clayton DG. Betting odds and genetic associations.

*J Natl Cancer Inst*. 2004;96:421–423.

2. Altman DG.

*Practical Statistics for Medical Research*. London: Chapman and Hall; 1991.

3. Greenland S, Rothman KJ. Fundamentals of epidemiologic data analysis. In: Rothman KJ, Greenland S, eds.

*Modern Epidemiology*. 2nd ed. Philadelphia: Lippincott-Raven; 1998:201–229.

4. Rothman KJ. Significance questing.

*Ann Intern Med*. 1986;105:445–447.

5. Walter SD. Methods for reporting statistical results from medical research studies.

*Am J Epidemiol*. 1995;141:896–906.

6. Lang JM, Rothman KJ, Cann CI. That confounded

*P*-value.

*Epidemiology*. 1998;9:7–8.

7. Wacholder S, Chanock S, Garcia-Closas M, et al. Assessing the probability that a positive report is false: an approach for molecular epidemiology studies.

*J Natl Cancer Inst*. 2004;96:434–442.

8. Hung RJ, Brennan P, Canzian F, et al. Large-scale investigation of base excision repair genetic polymorphisms and lung cancer risk in a multicenter study.

*J Natl Cancer Inst*. 2005;97:567–576.

9. Wakefield J. A Bayesian measure of the probability of false discovery in genetic epidemiology studies.

*Am J Hum Genet*. 2007;81:208–227.

10. Clayton D, Hills M.

*Statistical Models in Epidemiology*. Oxford: Oxford University Press; 1993.

11. Mertens F, Strömberg U, Mandahl N, et al. Prognostically important chromosomal aberrations in soft tissue sarcomas: a report of the Chromosomes and Morphology (CHAMP) study group.

*Cancer Res*. 2002;62:3980–3984.

12. Mitelman F. An International System for Human Cytogenetic Nomenclature. Basel: Karger; 1995.

13. Greenland S, Poole C. Empirical-Bayes and semi-Bayes approaches to occupational and environmental hazard surveillance.

*Arch Environ Health*. 1994;49:9–16.

14. Hung RJ, Brennan P, Malaveille C, et al. Using hierarchical modeling in genetic association studies with multiple markers: application to a case-control study of bladder cancer.

*Cancer Epidemiol Biomarkers Prev*. 2004;13:1013–1021.

15. Steenland K, Bray I, Greenland S, et al. Empirical Bayes adjustments for multiple results in hypothesis-generating or surveillance studies.

*Cancer Epidemiol Biomarkers Prev*. 2000;9:895–903.

16. Borden EC, Baker LH, Bell RS, et al. Soft tissue sarcomas of adults: state of the translational science.

*Clin Cancer Res*. 2003;9:1941–1956.

17. Wang WYS, Barrat BJ, Clayton DG, et al. Genome-wide association studies: theoretical and practical concerns.

*Nat Rev Genet*. 2005;6:109–118.

18. Matullo G, Berwick M, Vineis P. Gene-environment interactions: how many false positives?

*J Natl Cancer Inst*. 2005;97:550–551.

19. Cordell HJ, Clayton DG. Genetic association studies.

*Lancet*. 2005;366:1121–1131.

20. Greenland S. Principles of multilevel modelling.

*Int J Epidemiol*. 2000;29:158–167.

21. Lönnstedt I, Britton T. Hierarchical Bayes models for cDNA microarray gene expression.

*Biostatistics*. 2005;6:279–291.

22. Verzilli CJ, Stallard N, Whittaker JC. Bayesian graphical models for genomewide association studies.

*Am J Hum Genet*. 2006;79:100–112.