Secondary Logo

Journal Logo

Economics, Education, and Policy: Research Reports

Improvement in the Quality of Randomized Controlled Trials Among General Anesthesiology Journals 2000 to 2006: A 6-Year Follow-Up

Greenfield, Mary Lou V. H. MPH, MS; Mhyre, Jill M. MD; Mashour, George A. MD, PhD; Blum, James M. MD; Yen, Eugene C. BS; Rosenberg, Andrew L. MD

Author Information
doi: 10.1213/ane.0b013e31819fe6d7


The randomized controlled trial (RCT) is considered the most valid method of comparing treatments and making inferences regarding cause and effect.1,2 A RCT is a planned experiment in human subjects randomized to a test treatment or to a control treatment; subjects are enrolled, treated, and followed over the same time period.3 The results of a RCT can directly and immediately affect patient care more than any other type of study design.4 In 2004, Moher et al.5 reported “… if the conduct or reporting of RCTs is poor, treatments may be introduced that are less effective than was thought or that might even be ineffective.” Examining the published reports of RCTs is the only way we have to determine the quality of their conduct and analysis.

The medical literature has numerous meta-analysis and systematic review tools that synthesize evidence from RCTs to guide best clinical practices.6–9 In the mid-1990s the consolidated standards of reporting trials (CONSORT) checklist for reporting RCTs was published;* since then, these standards have been further developed and expanded with the expectation that the quality of RCTs will improve as editors and authors implement these standards for their journals and clinical trials.10 Although anesthesiology journals frequently publish RCTs, as well as systematic reviews and meta-analyses, there have only been three studies in English of the quality of these types of studies in the anesthesiology literature since 2000.11–13

In 2005, we published the results of a quality assessment review of 279 RCTs published in anesthesiology journals (Anesthesiology, Anesthesia & Analgesia, Anaesthesia, and Canadian Journal of Anesthesia) from January through December 2000.11 The purpose of the previous review was to delineate specific areas for improvement in the conduct, implementation, analysis, and reporting of RCTs in the anesthesiology literature. The goal of the current study was twofold: to compare articles published in four anesthesiology journals in 2006 with those published in the same journals in 2000 to determine areas of improvement over the past 6 years; and to identify areas for future improvement. We hypothesized that there would be significant improvements in the quality of reporting of protocol designs and analyses.


All human RCTs published between January 2006 and December 2006 in four anesthesiology journals (Anesthesiology, Anesthesia & Analgesia, Anaesthesia, and Canadian Journal of Anesthesia) were retrieved using the same methodology as described previously11 (Fig. 1). Articles were obtained in portable document format and all identifiers (e.g., journal name, authors) were removed by an investigator (EY) who was not involved in further evaluation. Before beginning the analyses, the reviewers used the scoring instrument (Table 1) and scoring guide to independently assess 10 randomly selected articles not included in the main study. Reviewers met for two training sessions during which the training articles and the scores assigned by each reviewer were discussed and clarified for reliability purposes. Remaining study articles were randomly divided into four groups using a computer-generated randomization scheme, such that each reviewer (AR, JB, GM, JM) evaluated 50 articles. A fifth reviewer (MLG) evaluated all 200 articles. Scoring discrepancies were resolved by consensus.

Figure 1
Figure 1:
Figure 1.
Table 1
Table 1:
Comparison of Quality Scores in 2000 with Scores in 2006
Table 1
Table 1:

To evaluate each article, we used a modified version of Chalmers quality assessment tool6,14; the same tool was used in the 2000 quality review11 (Table 1). The scale includes 14 quality domains, each of which has precise requirements for what must be recorded to achieve a certain weighted score.15 Percentages (total score divided by total possible score) were assigned because some items were not applicable to the individual RCT under review and were not included in the denominator of possible points. This method of scoring has been used in numerous studies of clinical research.16,17

For each element on the Chalmers scale score, frequencies were compared between all articles published in 2000 with all articles published in 2006 using Pearson's χ2 test or Fisher's exact test as appropriate. Total quality scores were averaged across all articles published in each journal. Analysis of variance was used to compare the set of mean total quality scores for the four journals between 2000 and 2006. Statistical analyses were conducted using the Statistical Analysis System (SAS 9.1; SAS Institute, Cary, NC).


The mean weighted quality scores (± the standard deviation [sd]) for the four individual journals, from 2006, were 63% ± 16%, 53% ± 15%, 57% ± 16%, and 59% ± 14%. Figure 2 shows the 2006 quality scores compared with the 2000 scores. The overall mean quality score for the 200 studies from 2006 was 58% (95% confidence interval [CI] = 55,60) compared with the mean quality score of 44% (95% CI = 42,46) for the 279 studies from 2000. This represents an absolute increase in quality score from 2000 to 2006 of 15% ± 3% (P = 0.0015).

Figure 2
Figure 2:
Figure 2.

A comparison of quality score items among the journals from 2000 and 2006 is presented in Table 1. Six score items demonstrated improvement with scores above 50% between 2000 and 2006, including items #3, 4, 6, 8, 9, and 14. One score item (#1) remained essentially unchanged but above 80%. Another six quality items showed at least a trend towards improvement, but the item quality scores remained <50%, including items #2, 5, 10–13.


Between 2000 and 2006, quality scores for RCTs reported in each of the four anesthesiology journals improved 15% (sd = 3%). The 2006 mean quality score for anesthesiology RCTs was similar to evaluations of clinical RCTs in other medical fields (56%16 and 58%18) that have used the Chalmers scoring system. Despite the improvement in quality score, the current average quality score of 58% remains below 80%, the threshold that is considered sufficient to warrant acceptance of the study conclusions.6

Particular items that contributed to the significantly higher 2006 quality scores (Table 1) were an increase in the number of articles reporting patient blinding to treatment, observer blinding to treatment, sample size estimates, the randomization results on important pretreatment variables, major study outcomes, and types of statistical tests and their significance levels, and side effects.

Although almost all score items in 2006 demonstrated at least a trend toward improvement, several quality scores continue to indicate deficiencies both in lower scores assigned and showed less improvement between 2000 and 2006 than other score items. Score items #2 (randomization blinding), #5 (observer blinding to continuing study results), and #10 (post-β estimates discussions, including the probability of Type II errors in negative studies) present the most objective opportunities for maximum improvement in the quality of reporting of RCTs.

First, randomization blinding (item #2) requires two descriptions: 1) how the randomized allocation was generated; and 2) how the randomized allocation was concealed from study personnel before the study assignment for each patient. Appropriate randomization procedures must provide completely random assignments, by relying, for example, on random numbers tables or computer-generated randomization schemes. In many studies, randomization generation was either not described at all or was listed using unacceptable methods (e.g., subjects were randomized by hospital number or by shuffling envelopes containing assignments). Randomization concealment from study personnel was frequently not described in the studies we reviewed or was not described in enough detail to indicate a tamper-proof concealment (e.g., the assignment was in a closed envelope). Examples of acceptable descriptions of masking treatment assignments included: 1) sequentially numbered sealed and opaque envelopes containing group assignments opened at the time of patient allocation or 2) the blinded study drug was prepared by a hospital pharmacist who maintained the randomization allocation.

Second, investigator blinding to ongoing study results (item #5) was described in 6% of the studies. Ideally, investigators should remain blind to the accumulating results of the study because knowledge of previous patient results may impact decisions to recruit particular patients, to encourage the recruitment process, or to continue or discontinue the study prematurely. Planned unblinded interim analyses should be outlined in the study protocol before study initiation (i.e., before any unblinding of investigators), completed with an accounting of the potential for bias and the impact on Type I error, and performed according to written study protocols, analysis design, unblinding strategies, and data monitoring committee charters. Reviews of accumulating data to ensure data quality may be completed without revealing the randomization code. If interim analyses with knowledge of randomized assignments are required to ensure patient safety, then an independent data safety monitoring board is appropriate.

Third, of the 76 studies that failed to reject the primary null hypothesis in 2006, only 29% (54 of 76) addressed the possibility that such findings could have been due to a Type II error (item #10) or small sample size. Similar to the 2000 review, authors continued to interpret a lack of statistically significant difference in outcomes to indicate that the treatments were equally effective or that the new treatment was an acceptable alternative to the control. When the objective of a trial is to demonstrate equivalence, the null hypothesis is constructed as one of a difference between a standard and new treatment and the research hypothesis is constructed as one of equivalence between the two treatments; appropriate statistical methodology (e.g., 2 one-sided tests [TOST] for equivalency) should be applied.19 When the objective of a trial is to demonstrate a difference, but the study results in a negative outcome, then to satisfy item #10, the investigators should discuss the probability of a Type II error. Post-β estimates of a clinically interesting (but statistically nonsignificant) difference in negative trials and calculation of 95% confidence intervals around this difference help guide sample size estimates for future studies.

Of interest, 53% of RCTs evaluated in 2000 in our previous review were negative trials; in 2006, 38% of studies had negative results. The smaller number of negative studies published in 2006 compared with 2000 raises the possibility of publication bias among the four anesthesiology journals in 2006. In the surgical literature, Hasenboehler et al.20 found that 17% of articles presented negative results. They concluded that a bias towards publishing positive data disregards important information derived from unpublished negative studies.

There are limitations to our approach to measuring and comparing the quality of reporting among RCTs in 2000 with RCTs in 2006. First, because a score item is not described in an article does not mean that it was not performed in the trial itself. However, it is generally accepted that the methodologic rigor of a study is likely reflected in the quality of its reporting.21 Second, this 2006 review was limited to the four anesthesiology journals that were studied in 2000 and 2006; it is possible that other important anesthesiology journals could reflect higher or lower quality scores than of those measured in this study's four-journal review. Third, a tool other than the Chalmers system could have produced different results than found in our study. There is no clear “gold standard” for evaluating the reporting or quality of RCTs as evidenced by other tools available.22,23 Lastly, we cannot exclude the possibility that the format of the articles reviewed could have enabled reviewer recognition of the journal in which it was published.

The CONSORT checklist has been adopted by three of the four journals evaluated in this study: Anesthesiology and the Canadian Journal of Anesthesia have recommended the use of CONSORT since 2003, Anesthesia & Analgesia has required the use of CONSORT since 2006. Improvements in quality scores between 2000 and 2006 may reflect increasing use of the CONSORT checklist and a growing emphasis on the quality, consistency, and transparency of reporting.

In conclusion, the results of this study indicate that the quality of RCTs in four leading anesthesia journals since 2000 has improved, particularly in the reporting of sample size estimation, in the description of the test statistics to be used and critical P values, and in the discussions of side effects. However, further improvement is warranted. Future efforts to enhance the quality of RCTs should be directed towards: 1) reporting how the randomization allocation is generated and concealed, 2) blinding investigators to ongoing study results, and 3) reporting complete analyses of negative outcomes to assess the probability of Type II error.


1. Rosenberg AL, Wei JT. Clinical study designs in the urologic literature: a review for the practicing urologist. Urology 2000;55:468–76
2. Altman DG. Better reporting of randomised controlled trials: the CONSORT statement. BMJ 1996;313:570–1
3. Meinert C. Clinical trials: design, conduct, and analysis. New York: Oxford University Press, 1986
4. Begg C, Cho M, Eastwood S, Horton R, Moher D, Olkin I, Pitkin R, Rennie D, Schulz KF, Simel D, Stroup DF. Improving the quality of reporting of randomized controlled trials. The CONSORT statement. JAMA 1996;276:637–9
5. Moher D, Altman DG, Schulz KF, Elbourne DR. Opportunities and challenges for improving the quality of reporting clinical research: CONSORT and beyond. CMAJ 2004;171:349–50
6. Chalmers TC, Smith H Jr, Blackburn B, Silverman B, Schroeder B, Reitman D, Ambroz A. A method for assessing the quality of a randomized control trial. Control Clin Trials 1981;2:31–49
7. DerSimonian R, Charette LJ, McPeek B, Mosteller F. Reporting on methods in clinical trials. N Engl J Med 1982;306:1332–7
8. Detsky AS, Naylor CD, O'Rourke K, McGeer AJ, L'Abbe KA. Incorporating variations in the quality of individual randomized trials into meta-analysis. J Clin Epidemiol 1992;45:255–65
9. Jadad AR, Moore RA, Carroll D, Jenkinson C, Reynolds DJ, Gavaghan DJ, McQuay HJ. Assessing the quality of reports of randomized clinical trials: is blinding necessary? Control Clin Trials 1996;17:1–12
10. Moher D, Schulz KF, Altman DG. The CONSORT statement: revised recommendations for improving the quality of reports of parallel-group randomised trials. Lancet 2001;357:1191–4
11. Greenfield ML, Rosenberg AL, O'Reilly M, Shanks AM, Sliwinski MJ, Nauss MD. The quality of randomized controlled trials in major anesthesiology journals. Anesth Analg 2005;100:1759–64
12. Pua HL, Lerman J, Crawford MW, Wright JG. An evaluation of the quality of clinical trials in anesthesia. Anesthesiology 2001;95:1068–73
13. Halpern SH, Darani R, Douglas MJ, Wight W, Yee J. Compliance with the CONSORT checklist in obstetric anaesthesia randomised controlled trials. Int J Obstet Anesth 2004;13:207–14
14. Rochon PA, Gurwitz JH, Simms RW, Fortin PR, Felson DT, Minaker KL, Chalmers TC. A study of manufacturer-supported trials of nonsteroidal anti-inflammatory drugs in the treatment of arthritis. Arch Intern Med 1994;154:157–63
15. Rochon PA, Gurwitz JH, Cheung CM, Hayes JA, Chalmers TC. Evaluating the quality of articles published in journal supplements compared with the quality of those published in the parent journal. JAMA 1994;272:108–13
16. Berghmans T, Paesmans M, Meert AP, Mascaux C, Lothaire P, Lafitte JJ, Sculier JP. Survival improvement in resectable non-small cell lung cancer with (neo) adjuvant chemotherapy: results of a meta-analysis of the literature. Lung Cancer 2005;49:13–23
17. Manzoli L, Schioppa F, Boccia A, Villari P. The efficacy of influenza vaccine for healthy children: a meta-analysis evaluating potential sources of variation in efficacy estimates including study quality. Pediatr Infect Dis J 2007;26:97–106
18. Le Quintrec JL, Bussy C, Golmard JL, Herve C, Baulon A, Piette F. Randomized controlled drug trials on very elderly subjects: descriptive and methodological analysis of trials published between 1990 and 2002 and comparison with trials on adults. J Gerontol A Biol Sci Med Sci 2005;60:340–4
19. Phillips KF. Power of the two one-sided tests procedure in bioequivalence. J Pharmacokinet Biopharm 1990;18:137–44
20. Hasenboehler EA, Choudhry IK, Newman JT, Smith WR, Ziran BH, Stahel PF. Bias towards publishing positive results in orthopedic and general surgery: a patient safety issue? Patient Saf Surg 2007;1:4
21. Altman DG, Dore CJ. Randomisation and baseline comparisons in clinical trials. Lancet 1990;335:149–53
22. Olivo SA, Macedo LG, Gadotti IC, Fuentes J, Stanton T, Magee DJ. Scales to assess the quality of randomized controlled trials: a systematic review. Phys Ther 2008;88:156–75
23. Moher D, Jadad AR, Nichol G, Penman M, Tugwell P, Walsh S. Assessing the quality of randomized controlled trials: an annotated bibliography of scales and checklists. Control Clin Trials 1995;16:62–73

*An international group of clinical trialists, statisticians, and medical journal editors proposed the CONSORT initiative; it has since gained large acceptance by medical journals and editorial groups worldwide. CONSORT is a 22-item checklist and flow diagram intended for authors to adequately describe key study methodologies that are required for readers to evaluate the validity, limitations, and generalizability of clinical trials including details for the enrollment, interventions, allocation, follow-up, and statistical analysis of the clinical trial.
Cited Here

© 2009 International Anesthesia Research Society