Click on the links below to access all the ArticlePlus for this article.
Please note that ArticlePlus files may launch a viewer application outside of your web browser.
Meta-analysis is used in a wide range of disciplines—notably in epidemiology and evidence-based medicine, where results of some meta-analyses have led to changes in clinical practice and health care policies. Meta-analyses use statistical methods to combine the results of several studies that address similar research hypotheses. The basic premise is that the combined results from a group of studies can allow a more precise and balanced estimate of an effect than the individual studies.
One problem with meta-analyses of clinical trials is that differences among trials are not addressed appropriately by current meta-analysis models.1 There are several reasons for these interstudy differences, including chance, different definitions of treatment effects, design-related heterogeneity (quality), and finally unexplainable and real differences, all of which may introduce bias.2 The most important of the explainable differences is quality, which refers to the likelihood that the trial design has generated unbiased results with sufficient precision to allow clinical application.3 The quality of individual studies will also affect the quality of the combined estimates, as well as the magnitude of the results. If the quality of the primary material is inadequate, the conclusions of the review may be invalid regardless of the use of a random-effects model. Such inadequacies may occur in the randomization process, in the masking to the allocated treatment, in the random generation of number sequences, or in the analysis. There is therefore a need to assess the quality of studies in a more explicit way than simply to insert a random term based on heterogeneity,4 as is done with the random-effects model.
One way of dealing with this problem is to include only the most methodologically sound studies in the meta-analysis, a practice often termed “best-evidence meta-analysis.”5 Another approach is to include weaker studies, and add a study-level predictor variable that reflects the methodologic quality of the studies. This allows an assessment of the effect of study quality on the effect size.6 Yet another possibility is to incorporate the quality scores as weights.7 However the latter has been done intuitively, without a clear methodologic basis for optimizing the estimate. Our objective here is to present a new method of weighting studies in a meta-analysis using a quality score (akin in some ways to weighting using the homogeneity statistic). We compare the results obtained from applying the old and new methods in a summary of clinical trials that were included in a systematic review of the efficacy of radioactive iodine for the ablation of thyroid remnants after surgery for thyroid cancer.8
Because the results from different studies investigating different independent variables are measured on different scales, the dependent variable in a meta-analysis is some standardized measure of effect size. When the outcome of the experiments is dichotomous (success vs. failure), one of the commonly used effect measures in clinical trials is a relative risk (RR). The approach frequently used is the “inverse-variance method” based on Woolf.9 The average effect size across all studies is computed, with the weights equal to the inverse variance of each study's effect estimator. Larger studies and studies with less random variation are given greater weight than smaller studies. In the case of studies reporting a RR, the log RR has a standard error (se) given by
where PiT and PiC are the risks of the outcome in the treatment group and control groups, respectively, of the ith study and n is the number of patients in the respective groups. The weights (w) allocated to each of the studies are inversely proportional to the square of the standard error; thus,
which gives greater weight to those studies with smaller standard errors. The combined effect size is computed by the weighted average as
is the overall effect size measure; it has a standard error given by
Assuming these estimates are distributed normally, the 95% confidence intervals (CIs) are easily obtained as
As can be seen above, the current approach for combining effects is to incorporate variability in studies by adjustment based on the variance of the estimates in each individual study. Thus, the lower the variance of a RR estimate the greater its weight in the final combined estimate. This approach, although statistically very appealing, does not take into account the further variability that exists among studies arising from differences in the study protocols and how well they were executed. This limitation gave rise to the random-effects model approach.10 However, because of the limitations of the random-effects model,10 a statistical adjustment for heterogeneity will still produce invalid estimates when used in a meta-analysis of badly designed studies. Furthermore, adjustments based on an artificially inflated variance lead to a widened CI, supposedly to reflect heterogeneity. This added imprecision does not have much clinical relevance.4
We propose a new approach to adjustment for interstudy variability by incorporating a relevant component (quality) that differs among studies, in addition to the weight based on the intrastudy variance used in fixed-effects meta-analysis. We do this by introducing a correction, called
for the quality-adjusted weight of the ith study. This is a composite based on the quality of studies other than the study under consideration, and redistributes quality-adjusted weights based on the quality-adjusted weights of other studies. For example, if study i is of good quality and other studies are of poor quality, a proportion of quality-adjusted weights from the other studies is mathematically redistributed to study i, giving it more weight in the overall effect size. As studies increase in quality, redistribution becomes progressively less and ceases when all studies are of perfect quality. To accomplish this, we first have to adjust weights for quality. One way to incorporate quality scores into such an analysis is as follows11–14:
is the judgment of the probability (0–1) that study i is credible, based on the study methodology. The variance of this weighted average is then11:
However, this probabilistic viewpoint on quality-adjusted weights has limitations, and we expand on this system of incorporating quality by adjusting the weight as well as redistributing weights based on quality. This is done as follows. Given that
is our quality adjustor for the ith study and N is the number of studies in the analysis, then the quality effects modified
is given by:
The final summary estimate is then given by:
while the variance of this weighted average is then
Although it may seem that
is a function of wi given that
it would mean that by multiplying
with wi we are actually adjusting the product of quality and weight by
By our definition in the text, the latter is a function of the quality and weights of other studies excluding this ith study.
Our suggested adjustment has a parallel to the random effects model, where a constant is generated from the homogeneity statistic
Using this and other study parameters, a constant (
) is generated as
The inverse of the sampling variance, plus this constant representing the variability across the population effects, is then used as the weight
In effect, as
gets bigger, the seES increases, thus widening the CI; the weights, however, become progressively more equal. In essence, this is the basis for the random-effects model—a form of redistribution of the weights so that outlier studies do not unduly influence the pooled effect size. This is what our method does, as well—the differences being that we use a method based on quality rather than statistical heterogeneity, and
is not as artificially inflated as in the random-effects model. The random-effects model adds a single constant to the weights of all studies in the meta-analysis based on the statistical heterogeneity of the trials. Our method redistributes the quality adjusted weights of each trial based on the measured quality of the other trials in the meta-analysis. Because
the addition of an external constant will inflate the variance much more than a redistribution of the weights (assuming the studies demonstrate varying effects). Obviously, if a random variable is inserted to inflate the variance based on heterogeneity, it is not clear what aspect of between-trial differences is being assessed. Senn4 has provided an analytic demonstration of this.
The computations for the quality effects model were placed into an Excel spreadsheet where the user can substitute trial and quality information to automatically generate a pooled effect size under this model. The spreadsheet is available with the online version of this paper.
We have previously published a meta-analysis of radioactive iodine dosage for the ablation of thyroid remnants.8 We included 22 studies in this meta-analysis; 6 were randomized controlled trials (RCTs) with mixed surgical status (group 2),15–20 4 were cohorts with near-total thyroidectomy (group 1),21–24 and 12 were cohorts with mixed surgical status (group 3).23,25–35 We did a quality assessment based on our suggested method in Table 1, and study characteristics are given in Table 2. We calculated the fixed-effects RR for all 22 studies as 0.79 (95% CI = 0.72–0.88) while the random-effects RR for the 22 studies was 0.73 (0.62–0.85). After applying the quality effects model, however, we got a pooled RR for all 22 studies (Fig. 1) of 0.69 (0.58–0.82). The weights for each meta-analysis method are shown in Table 3.
The difference between the random-effects and quality-effects weights is that the latter includes assessment of differences in study quality. Within the mixed surgical status cohort (Table 3), the study by Angelini et al26 accounted for 65% weight using the fixed-effects model and also the random-effects model, as there were no large discordances in results among this group of studies. The quality-effects weight for the Angelini study was however, down to 18%, reflecting the fact that it was a 10-year-old abstract for which the whole study was never published. The new weights, depicted in Table 3, suggest that the quality-effects model works in a fashion similar to the random-effects model, except that it includes a nonrandom adjustment of weights driven by a numerical assessment of methodologic quality, which has greater clinical relevance. Another benefit of this model is that it avoids the artificial inflation in variance seen with the random-effects model. With a perfect quality score for all studies, the quality-effects model defaults to the fixed-effects model, just as the random-effects does for homogenous trials. If we look at the subgroup with large heterogeneity (RCTs), the CI for the random model pooled-effect size is substantially wider than for the quality-effects model. However weights are redistributed approximately equally in both models. Again, despite the redistribution, the abstract by Sirisalipoch et al20 gets the lowest weight in the quality-effects model, suggesting that quality differences are being used rather than a random statistical factor. Indeed Hackshaw et al36 have also reported a meta-analysis of the association we examined here. Those authors concluded that it was not possible to draw a conclusion from the combined studies for several reasons: the trials were heterogenous and therefore could not be combined into 1 stratum; studies with different designs were also not combined; results were based on subgroup analyses. The random-effects model would have been unable to provide an answer due to the problem of inflated variance. However, the quality-effects model suggests benefit for a higher dose even in the most heterogenous subgroup (the 6 RCTs), in keeping with the trend with the rest of the studies (Table 3).
In this report we suggest a method whereby the between-study variability is adjusted based on an assessment of the varying quality of the studies rather than depending on the distribution of the effect estimates from different studies to adjust for heterogeneity. The choice of model includes a judgment of how much weight should be accorded to a trial by virtue of its numbers and effect size alone. In a heterogeneous set of studies, a random-effects meta-analysis will award relatively more weight to smaller studies than such studies would receive in a fixed-effect meta-analysis. However, if results of smaller studies are systematically different from results of larger ones, which can happen as a result of publication bias or low study-quality bias,37,38 then a random-effects meta-analysis will exacerbate the effects of the bias. A fixed-effect analysis will be affected less, although it would be inappropriate because it gives a higher weight to a very large trial simply because of its higher precision. However, sample size by itself does not make the estimate more valid or more generalizable. Increased size may be accompanied by simplification of recruitment and data collection in a way that increases the risks of protocol deviation, poor data quality, misclassification, and nontrial use of trial treatments—all of which tend to create a bias towards the null. We demonstrated an alternative approach with a real life application in which neither a fixed-effects nor a random-effects model of classic meta-analysis adequately described the results.
It has been reported that there is no correlation between quality scores and variation in treatment difference in RCTs.39 However, the quality effects model does not relate the quality score directly to the effect size; such correction is possible only if we know the effects of study imperfections on the outcome measures. Unfortunately, for many biases the precise effects will not be known and hence cannot be corrected for.40 What can be done (and what is done in our quality effects model) is to redistribute study weights by quality, such that the effect sizes more likely to be accurate get a relatively greater weight redistribution when weighting by precision.
The decision to use a random-effects model is often based on a simple test of homogeneity41 of the studies involved. It has even been suggested that a random-effects model should be routinely adopted because of the demonstration that, in the presence of even slight between-study heterogeneity, the fixed-effects model results in inferences that substantially underestimate the variation in the data and in the parameter estimates.42 Nevertheless, despite this widespread perception, it is now understood that the choice of fixed-effects or random-effects meta-analysis should not be made on the basis of perceived heterogeneity but on the basis of purpose.4 It is always valuable to perform a fixed-effects meta-analysis because this tests the null hypothesis that treatments were identical in all trials.4 If this null is rejected, then the alternative hypothesis may be asserted, ie, that there is at least 1 trial in which the treatments differed. In other words, the random-effects analysis works as a check on the robustness of conclusions from a fixed-effects model to failure in the assumption of homogeneity43; to go beyond this causal “finding” requires strong assumptions.4 If a random variable is inserted to inflate the variance based on heterogeneity, it is not clear what aspect of between-trial differences is being assessed. This approach fails to take into account quality differences among the individual studies. The strength of our quality-effects meta-analysis is that it allows available methodologic evidence to influence subjective random probability.
For a quality-effects meta-analysis, a reproducible and effective scheme of quality assessment is required. The scheme we used in our illustration was developed in part, by the Delphi method,44 in which 206 items associated with study quality were reduced to 9 by means of the Delphi consensus technique. The final set of items assesses 3 dimensions of the quality of studies (internal validity, external validity, and statistical analysis), and focuses on clinical trials. Compared with assessment of randomized clinical trials, the tolls for quality assessment of observational designs in systematic reviews are far less well developed.45 The feasibility of creating 1 quality checklist to apply to various study designs has been explored.46 Research has gone into developing an instrument to measure the methodologic quality of observational studies,47 and a scale to assess the quality of observational studies in meta-analyses.48 Nevertheless, there is no consensus on how to synthesize information about quality from a range of study designs within a systematic review.
This paper focuses on a statistical model for incorporation of quality information into a meta-analysis, rather than development of a quality score. There are many quality scores available; Moher et al49 reported at least 25 quality assessment scores by 1995, and more have been proposed since. In theory, any quality score can be used with our quality-effects meta-analytic approach. Our statistical method makes use of Qi and any score can be converted to Qi simply by dividing the measured score by its maximum score. The scheme we suggest in Table 1 is based on a combination of the Newcastle-Ottawa quality assessment scale for observational studies and the Delphi model for experimental studies. Adoption of the quality effects meta-analysis should encourage further development of such scoring systems.
We thank Leon Bax of Kitasato University for agreeing to make this method available in the next update of the MIX program, which is a comprehensive free software for meta-analysis of causal research data available from the web at http://www.mix-for-meta-analysis.info.
1. Eysenck HJ. Meta-analysis and its problems. BMJ
2. Bailey KR. Inter-study differences: how should they influence the interpretation and analysis of results. Stat Med
3. Verhagen AP, de Vet HC, de Bie RA, et al. The art of quality assessment of RCTs included in systematic reviews. J Clin Epidemiol
4. Senn S. Trying to be precise about vagueness. Stat Med
5. Eysenck HJ. Meta-analysis of best-evidence synthesis. J Eval Clin Pract
6. Detsky AS, Naylor CD, O'Rourke K, et al. Incorporating variations in the quality of individual randomized trials into meta-analysis. J Clin Epidemiol
7. Berard A, Bravo G. Combining studies using effect sizes and quality scores: application to bone loss in postmenopausal women. J Clin Epidemiol
8. Doi SA, Woodhouse NJ, Thalib L, et al. Ablation of the thyroid remnant and I-131 dose in differentiated thyroid cancer: a meta-analysis revisited. Clin Med Res
9. Woolf B. On estimating the relation between blood group and disease. Ann Hum Genet
10. DerSimonian R, Laird N. Meta-analysis in clinical trials. Control Clin Trials
11. Tritchler D. Modelling study quality in meta-analysis. Stat Med
12. Klein S, Simes J, Blackburn GL. Total parenteral nutrition and cancer clinical trials. Cancer
13. Fleiss JL, Gross AJ. Meta-analysis in epidemiology, with special reference to studies of the association between exposure to environmental tobacco smoke and lung cancer: a critique. J Clin Epidemiol
14. Smith SJ, Caudill SP, Steinberg KK, et al. On combining dose-response data from epidemiological studies by meta-analysis. Stat Med
15. Creutzig H. High or low dose radioiodine ablation of thyroid remnants. Eur J Nucl Med
16. Johansen K, Woodhouse NJ, Odugbesan O. Comparison of 1073 MBq and 3700 MBq iodine-131 in postoperative ablation of residual thyroid tissue in patients with differentiated thyroid cancer. J Nucl Med
17. Bal C, Padhy AK, Jana S, et al. Prospective randomized clinical trial to evaluate the optimal dose of 131 I for remnant ablation in patients with differentiated thyroid carcinoma. Cancer
18. Gawkowska-Suwinska M, Turska M, Roskosz J, et al. [Early evaluation of treatment effectiveness using 131I iodine radiotherapy in patients with differentiated thyroid cancer]. Wiad Lek
. 2001;54(suppl 1):278–288.
19. Bal CS, Kumar A, Pant GS. Radioiodine dose for remnant ablation in differentiated thyroid carcinoma: a randomized clinical trial in 509 patients. J Clin Endocrinol Metab
20. Sirisalipoch S, Buachum V, Pasawang P, et al. Prospective randomised trial for the evaluation of the efficacy of low vs high dose I-131 for post-operative remnant ablation in differentiated thyroid cancer [Abstract]. World J Nuclear Med
21. Rosario PW, Reis JS, Barroso AL, et al. Efficacy of low and high 131I doses for thyroid remnant ablation in patients with differentiated thyroid carcinoma based on post-operative cervical uptake. Nucl Med Commun
22. Ramanna L, Waxman AD, Brachman MB, et al. Evaluation of low-dose radioiodine ablation therapy in postsurgical thyroid cancer patients. Clin Nucl Med
23. Doi SA, Woodhouse NJ. Ablation of the thyroid remnant and 131I dose in differentiated thyroid cancer. Clin Endocrinol (Oxf)
24. Verkooijen RB, Stokkel MP, Smit JW, et al. Radioiodine-131 in differentiated thyroid cancer: a retrospective analysis of an uptake-related ablation strategy. Eur J Nucl Med Mol Imaging
25. Hodgson DC, Brierley JD, Tsang RW, et al. Prescribing 131Iodine based on neck uptake produces effective thyroid ablation and reduced hospital stay. Radiother Oncol
26. Angelini F, Capezzone M, Cecarelli C, et al. Comparison among different 1311 activities for ablation of post-surgical thyroid residues [Abstract]. In: Report of the 24th Annual Meeting of the European Thyroid Association
. Darmstadt, Germany: Merck KgaA, No. 6, Thyroid International; 1997.
27. Zidan J, Hefer E, Iosilevski G, et al. Efficacy of I131 ablation therapy using different doses as determined by postoperative thyroid scan uptake in patients with differentiated thyroid cancer. Int J Radiat Oncol Biol Phys
28. Maxon HR III, Englaro EE, Thomas SR, et al. Radioiodine-131 therapy for well-differentiated thyroid cancer—a quantitative radiation dosimetric approach: outcome and validation in 85 patients. J Nucl Med
29. Logue JP, Tsang RW, Brierley JD, et al. Radioiodine ablation of residual tissue in thyroid cancer: relationship between administered activity, neck uptake and outcome. Br J Radiol
30. Liu RT, Huang MJ, Huang HS, et al. [Comparison of 30mCi and higher doses of iodine-131 for postoperative thyroid remnant ablation]. Taiwan Yi Xue Hui Za Zhi
31. Lin JD, Kao PF, Chao TC. The effects of radioactive iodine in thyroid remnant ablation and treatment of well differentiated thyroid carcinoma. Br J Radiol
32. Ramacciotti C, Pretorius HT, Line BR, et al. Ablation of nonmalignant thyroid remnants with low doses of radioactive iodine: concise communication. J Nucl Med
33. McCowen KD, Adler RA, Ghaed N, et al. Low dose radioiodide thyroid ablation in postsurgical patients with thyroid cancer. Am J Med
34. Lin JD, Chao TC, Huang MJ, et al. Use of radioactive iodine for thyroid remnant ablation in well-differentiated thyroid carcinoma to replace thyroid reoperation. Am J Clin Oncol
35. Degroot LJ, Reilly M. Comparison of 30- and 50-mCi doses of iodine-131 for thyroid ablation. Ann Intern Med
36. Hackshaw A, Harmer C, Mallick U, et al. 131I activity for remnant ablation in patients with differentiated thyroid cancer: a systematic review. J Clin Endocrinol Metab
37. Poole C, Greenland S. Random-effects meta-analyses are not always conservative. Am J Epidemiol
38. Kjaergard LL, Villumsen J, Gluud C. Reported methodologic quality and discrepancies between large and small randomized trials in meta-analyses. Ann Intern Med
39. Emerson JD, Burdick E, Hoaglin DC, et al. An empirical study of the possible relation of treatment differences to quality scores in controlled randomized clinical trials. Control Clin Trials
40. Glasziou PP, Sanders SL. Investigating causes of heterogeneity in systematic reviews. Stat Med
41. Cochran WG. Problems arising in the analysis of a series of similar experiments. J R Stat Soc
42. Brockwell SE, Gordon IR. A comparison of statistical methods for meta-analysis. Stat Med
43. Hardy RJ, Thompson SG. Detecting and describing heterogeneity in meta-analysis. Stat Med
44. Verhagen AP, de Vet HC, de Bie RA, et al. The Delphi list: a criteria list for quality assessment of randomized clinical trials for conducting systematic reviews developed by Delphi consensus. J Clin Epidemiol
45. Deeks JJ, Dinnes J, D'Amico R, et al. Evaluating non-randomised intervention studies. Health Technol Assess
46. Downs SH, Black N. The feasibility of creating a checklist for the assessment of the methodological quality both of randomised and non-randomised studies of health care interventions. J Epidemiol Community Health
47. Slim K, Nini E, Forestier D, et al. Methodological index for non-randomized studies (minors): development and validation of a new instrument. ANZ J Surg
48. Wells G, Shea B, O'Connell D, et al. The Newcastle-Ottawa Scale (NOS) for assessing the quality of nonrandomised studies in meta-analyses. Available at: http://www.ohri.ca/programs/clinical_epidemiology/oxford.htm
. Accessed June 15, 2007.
49. Moher D, Jadad AR, Nichol G, et al. Assessing the quality of randomized controlled trials: an annotated bibliography of scales and checklists. Control Clin Trials.