Secondary Logo

Journal Logo

Methodologic Issues in Randomized Controlled Trials of Surgical Interventions

Devereaux, P., J.*; McKee, Michael, D.; Yusuf, Salim*

Clinical Orthopaedics and Related Research: August 2003 - Volume 413 - Issue - p 25-32
doi: 10.1097/01.blo.0000080539.81794.54
SECTION I SYMPOSIUM: Issues in the Design, Analysis, and Critical Appraisal of Orthopaedic Clinical Research: Part I: Methodologic Issues in the Design of Orthopaedic Studies

A physician’s ability to make conclusions about the relative efficacy of interventions based on their clinical experience is limited frequently. Therefore, surgeons commonly use research evidence to guide their clinical practice. The randomized controlled trial is the strongest study design. However, randomization in itself does not guarantee the trial results are valid (free from bias). Understanding the potential impact of various methodologic features of a randomized controlled trial allows a clinician to determine the validity of a trial. We present a guide for evaluating the validity of randomized controlled trials giving special consideration to issues confronted in surgical trials.

From the *Department of Medicine and Population Health Research Institute, McMaster University, Hamilton, Ontario, Canada; and the Department of Surgery, St. Michael’s Hospital and the University of Toronto, Toronto, Ontario, Canada.

Dr. P.J. Devereaux is supported by a Heart and Stroke Foundation of Canada / Canadian Institutes of Health Research Fellowship Award. Dr Salim Yusuf holds an endowed Chair of the Heart and Stroke Foundation of Ontario and is a Senior Scientist of the Canadian Institutes of Health Research

Reprint requests to P. J. Devereaux, MD, McMaster University, Faculty of Health Sciences, Clinical Epidemiology & Biostatistics, Room 2C12, 1200 Main Street West, Hamilton, ON, L8N 3Z5, Canada. Phone: 905–525-9140 ext. 22900; Fax: 905–524-3841; E-mail:

Back to Top | Article Outline

Why Should Randomized Controlled Trials Be Done and Used to Guide Surgical Practice?

Surgeons want to know whether their procedures are effective. Although clinical observations can provide important insights, they may be limited by lack of objectivity. This results from difficulties in integrating observations (taking into account variations in the natural history of a disorder, placebo effect, subtle but important effects of patient selection for one procedure versus another, a patient’s desire to please, and an expectation that more aggressive interventions should be better), and drawing inferences from them. 29 As a result of these limitations, surgeons commonly rely on research evidence from a range of studies, including randomized controlled trials, to guide their clinical practice. 20

Clinical research is either observational (cohort study, case-control study, case series, case report) or experimental (randomized controlled trial). 15 If there is a very large treatment effect (insertion of a pacemaker for complete heart block) this is likely to be identified reliably in an observational study. Although the observational study may have moderate biases (systematic deviations from the truth), as long as these biases result in errors significantly less than the demonstrated very large treatment effect, an appropriate conclusion about treatment benefit will be obtained. However, most interventions and procedures have moderate treatment effects, and although moderate biases may not matter much when extremely large treatment effects exist, they matter considerably when the treatment effect is moderate or small. 40

Observational studies dominate the surgical literature. 7,30,31 However, when available, a randomized controlled trial has many advantages over an observational study. 35 Randomization eliminates biases in the choice of treatment, is the only means to control for unknown prognostic factors, and facilitates blinding. 35 Although there are cases where observational studies have been shown to have effect estimates similar to those of randomized controlled trials, 4 there are many cases where observational studies have been completely misleading. 12

Many clinicians and patients recently were surprised by the results of the Women’s Health Initiative study. 39 This large primary prevention trial randomized 16,608 women to estrogen plus progestin or matching placebo and observed them for 5 years. The results of net higher risks (higher risk of coronary artery disease, breast cancer, stroke, and pulmonary emboli) compared with benefits (lower risk of colorectal cancer and hip fractures) with hormone replacement therapy will globally affect clinical practice including patients with musculoskeletal disorders. 39 One of the reasons these results were so surprising is that a prior meta-analysis of 16 cohort studies and three cross-sectional angiographic studies showed a lower risk of coronary artery disease among women taking estrogen (relative risk of 0.5, 95% confidence interval, 0.44–0.57). 37

Contradictory examples of observational and randomized controlled trial results are not restricted to medical interventions. 12 An observational study of extracranial to intracranial bypass surgery suggested a “dramatic improvement in the symptomatology of virtually all patients” having the procedure. 32 However, a subsequent large randomized controlled trial showed a 14% relative increase in the risk of fatal and nonfatal stroke in patients having this procedure compared with the best medical treatment (antiplatelet therapy). 38

Because of the known advantages of randomized controlled trials over observational studies, the fact that most interventions and procedures have moderate as opposed to large treatment effects, and the examples of misleading information from observational studies, surgeons need to do randomized controlled trials and when randomized controlled trials are available use them as a guide to clinical practice.

Back to Top | Article Outline

Why Should Clinicians Evaluate the Methodology of Randomized Controlled Trials?

Although the randomized controlled trial is the strongest study design for evaluating clinical interventions, randomization in itself does not guarantee the results are valid (likelihood that the trial results are unbiased). Understanding the potential impact of various methodologic features of a randomized controlled trial allows a clinician to determine the validity of a trial. We will present a guide for evaluating the validity of randomized controlled trials giving special consideration to issues confronted in surgical trials.

Back to Top | Article Outline

Is the Randomized Controlled Trial Valid?

Table 1 presents a guide for evaluating the validity of a randomized controlled trial. This guide can be used to determine whether the results of a randomized controlled trial represent an unbiased estimate of the true treatment effect. Two of the methodologic issues that can affect the validity of a randomized controlled trial occur before the assignment of the therapeutic intervention and five occur after treatment assignment.



Back to Top | Article Outline

Were Patients Properly Randomized?

A randomized controlled trial is a trial where patients are assigned to treatments based on a random process. This process must ensure the unpredictability of treatment assignments. 35 The flipping of a coin, a computer-generated random allocation sequence, and a random number table all can generate true randomization sequences.

Sometimes authors report a study as being a randomized controlled trial but describe a process for determining treatment allocation that is not random but rather systematic. 33 Patient assignment based on admission date, hospital number, date of birth, or alternate assignments do not ensure random occurrences. 35 The limitation of these nonrandom methods of treatment allocation relate to the next guide, concealment of randomization.

Back to Top | Article Outline

Was There Concealment of Randomization?

Concealment of randomization means that the individuals enrolling patients into a randomized controlled trial are unaware of whether the next patient will be randomized to treatment or control. Without concealment of randomization investigators can systematically influence which patients receive the experimental and control interventions. This can destroy the underlying premise of a randomized controlled trial (patient assignment will not be based on a random process but rather an investigator’s choice).

The risk of using the nonrandom methods of treatment allocation, discussed previously, is the ease with which concealment of randomization can be compromised. Other factors can compromise concealment of randomization. Investigators have admitted to breaking concealment of randomization by holding translucent envelopes containing the treatment assignment up to the light and opening unnumbered envelopes containing the treatment assignment until they found the desired treatment. 17,36

Because of the biases that investigators could introduce when there is inadequate concealment of randomization, it is not surprising that empiric studies have shown that such trials when compared with adequately concealed randomized controlled trials, can overestimate the treatment effect by a substantial degree. 8,23,26,34 Indeed, such biases may be larger than the effect the study was designed to detect.

Readers of randomized controlled trials should look to ensure that investigators have been careful to ensure concealment of randomization. The most reliable methods are a central phone-in randomization process or pharmacy-administered blinded medication bottles. Other methods such as sealed opaque sequentially numbered envelopes can maintain concealment of randomization and readers’ confidence increases when the investigators report that audit checks were done and revealed no tampering.

Back to Top | Article Outline

Were Patients in the Various Treatment Groups Similar for Known Prognostic Factors at the Start of the Study?

Imagine a randomized controlled trial enrolling patients with open and closed tibial shaft fractures comparing the effect of reamed versus nonreamed intramedullary nailing techniques with a primary outcome of fracture healing. Patients with open fractures have a worse prognosis for fracture healing than patients with closed fractures. 6 If this trial is very small and only enrolls 10 patients, by chance alone one would not be overly surprised if four of the five patients randomized to the nonreamed group had an open fracture, and none of the five patients randomized to the reamed group had an open fracture. Despite randomization, this study is biased seriously in favor of the reamed group.

The purpose of randomization is to create groups of patients at the start of the study with similar prognoses. 17 When groups of patients start a randomized controlled trial with a similar prognosis, and the methodology of the trial is strong, then the trial results can be attributed to the interventions being evaluated. Sometimes through bad luck and especially in small trials, randomization may not produce groups with similar prognoses. Although one never can know about the balance of unknown prognostic factors “we are reassured when the known prognostic factors are reasonably well balanced.”17

When discussing differences in prognostic factors between treatment groups it is essential to realize this is referring to clinically important differences and not statistically significant differences. Frequently, probability values are presented as if they tell one whether the differences observed in the prognostic factors likely are attributable to chance. However, one already knows that all the differences that exist are attributable to chance because patients were assigned randomly to the treatment groups. Instead of focusing on whether there are statistically significant differences, one should focus on whether there are clinically significant differences. The more strong the relationship between the prognostic factor and the outcome and the more uneven the distribution of the factors between the treatment groups the more uncertain one should be about the results.

Even when clinically important differences exist between the treatment groups all is not necessarily lost. There are statistical techniques to adjust for differences and clinicians appropriately gain more confidence when the unadjusted and adjusted analyses come to the same conclusion.

This topic identifies one of the advantages of undertaking large randomized controlled trials. With large randomized controlled trials prognostic factors tend to balance between groups. Because currently surgical randomized controlled trials still are primarily limited to small sample sizes, an evaluation of baseline prognostic factors is important. Hopefully, the trend toward large surgical randomized controlled trials will continue and this will help to ensure more robust results.

Back to Top | Article Outline

Did the Surgeons Have Experience With the Interventions Under Evaluation?

Surgeons have a learning curve for all procedures. When a surgical randomized controlled trial shows that Approach B is superior to Approach A, all surgeons who previously used and thought Approach A to be superior question the skill level of the surgeons in the randomized controlled trial regarding doing Approach A. If the surgeons in the randomized controlled trial were not proficient with Approach A, before starting the randomized controlled trial, the results could be very misleading. This is why readers gain more confidence in the results of a surgical randomized controlled trial when the trial reports that the surgeons had shown completion of the learning curve before starting the study (the paper reports all surgeons completed a prerequisite number of cases with the procedures under evaluation before starting the randomized controlled trial). 24

Back to Top | Article Outline

Were the Patients, Healthcare Providers, Data Collectors, and Assessors of Outcomes Blinded to Treatment Allocation?

“Blinding (or masking) in randomized controlled trials is the process of withholding information about treatment allocation from those who could potentially be influenced by this information.”11 Several individuals in randomized controlled trials have the potential to introduce bias if they are made aware of the treatment allocation. Table 2 defines several groups who potentially can be blinded in a randomized controlled trial.



Unblinded participants can introduce bias through reporting of symptoms, willingness to continue in the study, use of other effective interventions, and the placebo effect. 1 Healthcare providers when unblinded can introduce bias through differential use of other effective interventions, advice to the patient as to whether to continue in the trial, and influencing patient reporting of outcomes. 1,2 Unblinded data collectors can introduce bias through differential encouragement during performance testing, differential timing and repeating of measurements, and recording of outcomes. 18,22 When unblinded, outcome assessors can introduce bias in their decisions primarily around subjective outcomes. 1,7

Although blinding often is thought impossible in surgical trials it has been used in surgical randomized controlled trials for a long time. In 1959, a randomized controlled trial involving a sham operation (placebo surgery), to facilitate blinding, showed internal mammary artery ligation did not affect angina. 10 Blinding of surgeons is impossible but participants, other healthcare providers, data collectors, and outcome assessors usually can be blinded.

A recent example highlights the importance of using blinding in surgical randomized controlled trials. Annually in the United States there are approximately 650,000 arthroscopic lavage and debridement procedures for knee pain secondary to osteoarthritis. 28 Previously numerous case series and one unblinded randomized controlled trial suggested a significant improvement in knee pain with lavage and debridement. 3,5,16,21,25 However, a much larger recent randomized controlled trial that compared arthroscopic debridement, lavage, and a sham procedure with blinding of patients, nonsurgical healthcare providers (nurses), data collectors, and outcome assessors showed no effect of arthroscopic debridement or lavage. 28 One of the accompanying editorials published along with this randomized controlled trial, in the New England Journal of Medicine, provides support for the ethics of doing randomized controlled trials that use placebo surgery. 19

When evaluating the methodology of any surgical randomized controlled trial it is important to consider the blinding status of the groups discussed and when a group is unblinded it is important to consider how likely it is that they could have introduced bias. For example, if mortality or a clearly identifiable morbid event (stroke) was the primary outcome, blinding of data collectors and outcome assessors is unlikely to be relevant. However, if the outcome is pain, as in the arthroscopic study, blinding of patients (to eliminate any placebo effect), nonsurgical healthcare providers, data collectors, and judicial assessors of outcomes is likely to matter.

A cautionary note about blinding terminology, one of the current authors previously showed that clinicians and epidemiology textbooks vary significantly in their definitions of who is blinded in single-blinded, double-blinded, and triple-blinded trials. 13 In keeping with the varying definitions of those studied the authors also have shown large variation in the groups identified as blind to treatment allocation in studies described as single- or double-blind. 27 Therefore, ignoring this inconsistent terminology (single-blinding, double-blinding, and triple-blinding) and focusing on which groups (Table 2) clearly are stated to be blind or unblind will allow incorrect assumptions to be avoided.

Back to Top | Article Outline

Were Patients Analyzed in the Groups to Which They Were Randomized?

In an intention-to-treat analysis all patients are analyzed in the groups to which they were randomized regardless of what treatment they did or did not receive. If patients are not analyzed in the groups to which they were randomized, than the balance in patient prognosis may be destroyed. Because the purpose of randomization is to create groups of patients with similar prognosis any analysis not based on intention-to-treat may jeopardize the validity of the trial.

Back to Top | Article Outline

Was Patient Followup Complete?

Patients lost to followup can threaten the validity of a randomized controlled trial. Patients who have been lost to followup may have experienced the outcome of interest. If the patients lost to followup did experience the outcome and there were enough patients lost to followup, the results of the trial could be altered. Various approaches have been suggested for assessing the likely impact when patients have been lost to followup (using the mean event rate from the patients not lost to followup, or a worse case scenario). 9,17 In the worse case scenario, it is assumed that everyone lost to followup in the treatment group experienced the outcome of interest and no one lost to followup in the control group experienced the outcome of interest. If the worse case scenario would not change the inferences from the trial results, than lost to followup is not an issue. However, even if a worse case scenario would change the inferences from the trial results it is important to realize that a worse case scenario is unlikely.

Back to Top | Article Outline

A Unique Issue Related to Randomization in Surgical Trials

Orthopaedic randomized controlled trials have randomized patients to surgeons who do their operation of preference. 14 This approach to randomization has been proposed to ensure that the surgeons in randomized controlled trials have experience with the intervention they are doing. 24 It is important in this type of surgical randomized controlled trial to ensure that there were some surgeons doing each type of operation at each center. If institutional factors (postoperative nursing care) have an impact on the outcome and there is variation in these institutional factors between centers it is possible that the results may not reflect the surgical intervention but rather the institutional factors. Therefore, a balance of surgeons doing the study interventions at each center allows readers more confidence that the results are related to the surgical interventions. Additional discussion and research is needed to evaluate this novel approach to randomization.

Back to Top | Article Outline

Putting It All Together

Frequently validity is seen as a dichotomous decision (a study is valid or invalid). Validity should be thought of as a continuum (Fig 1). Ideally there would be perfect randomized controlled trials that are completely valid. However, this is rarely, if ever the case. What is more realistic to consider is whether one would rate the trial validity as above a trustworthy threshold, below an untrustworthy threshold, or in the uncertain zone. Once this decision is made, readers can move onto the study results with the knowledge that they have judged the trial methodology to be trustworthy, untrustworthy, or uncertain.

Fig 1.

Fig 1.

As a final point it is important to realize that although a trial can be valid (unbiased) it still is possible that the trial results are incorrect because of random error (chance that can lead to false-positive or false-negative results). All study results have probability values that relate to the likelihood that the findings are attributable to chance. The likelihood of random error is minimized in large trials and trials with a large number of clinical events, especially if the statistical significance is extreme. Therefore, large clinical trials are advantageous to minimize biases and random error.

Randomized controlled trials have a very important role in the evaluation of surgical interventions. If properly designed, conducted, and interpreted their results are likely to make a substantial impact in the health of patients.

Back to Top | Article Outline


1. Altman DG, Schulz KF, Moher D, et al: The revised CONSORT statement for reporting randomized trials: Explanation and elaboration. Ann Intern Med 134:663–694, 2001.
2. Balk EM, Bonis PA, Moskowitz H: Correlation of quality measures with estimates of treatment effect in meta-analyses of randomized controlled trials. JAMA 287:2973–2982, 2002.
3. Baumgaertner MR, Cannon Jr WD, Vittori JM, Schmidt ES, Maurer RC: Arthroscopic debridement of the arthritic knee. Clin Orthop 253:197–202, 1990.
4. Benson K, Hartz AJ: A comparison of observational studies and randomized controlled trials. N Engl J Med 342:1878–1886, 2000.
5. Bert JM, Maschka K: The arthroscopic treatment of unicompartmental gonarthrosis: A five-year follow-up study of abrasion arthroplasty plus arthroscopic debridement and arthroscopic debridement alone. Arthroscopy 5:25–32, 1989.
6. Bhandari M, Guyatt G, Adili A, Tong D, Shaughnessy SG: Reamed versus non-reamed IM nailing of lower extremity long bone fractures: A systematic overview and meta-analysis. Orthop Trauma 14:2–9, 2000.
7. Bhandari M, Richards R, Sprague S, Schemitsch EH: The quality of randomized trials in Journal of Bone and Joint Surgery from 1988–2000. J Bone Joint Surg 84A:388–396, 2002.
8. Chalmers TC, Celano P, Sacks HS, Smith Jr H: Bias in treatment assignment in controlled clinical trials. N Engl J Med 309:1358–1361, 1983.
9. Chalmers TC, Smith Jr H, Blackburn B, et al: A method for assessing the quality of a randomized control trial. Controlled Clinical Trials 2:31–49, 1981.
10. Cobb LA, Thomas GI, Dillard, Merendino KA, Bruce RA: An evaluation of internal-mammary-artery ligation by a double-blind technic. N Engl J Med 260:1115–1118, 1959.
11. Devereaux PJ, Bhandari M, Montori VM, et al: Double blind, you are the weakest link: Good-bye! ACP J Club 136:A11–A12, 2002.
12. Devereaux PJ, Haynes B, Yusuf S: What is Evidence-Based Cardiology? In Yusuf S, Cairns JA, Camm AJ, Fallen EL, Gersh BJ (eds). Evidence Based Cardiology. London, BMJ Books 3–13, 2003.
13. Devereaux PJ, Manns BJ, Ghali WA, et al: Physician interpretations and textbook definitions of blinding terminology in randomized controlled trials. JAMA 285:2000–2003, 2001.
14. Finkemeier CG, Schmidt AH, Kyle RF, Templeman DC, Varecka TF: A prospective, randomized study of intramedullary nails inserted with and without reaming for the treatment of open and closed fractures of the tibial shaft. Orthop Trauma 14:187–193, 2000.
15. Grimes DA, Schulz KF: An overview of clinical research: The lay of the land. Lancet 359:57–61, 2002.
16. Gross DE, Brenner SL, Esformes I, Gross ML: Arthroscopic treatment of degenerative joint disease of the knee. Orthopedics 14:1317–1321, 1991.
17. Guyatt G, Cook D, Devereaux PJ, Meade M, Straus S: Therapy. In Guyatt G, Rennie DR (eds). Users’ Guides to the Medical Literature. Chicago, American Medical Association Press 55–79, 2002.
18. Guyatt GH, Pugsley SO, Sullivan MJ, et al: Effect of encouragement on walking test performance. Thorax 39:818–822, 1984.
19. Horing S, Miller FG: Is placebo surgery unethical? N Engl J Med 347:137–139, 2002.
20. Howes N, Chagla L, Thorpe M, McCulloch P: Surgical practice is evidence based. Br J Surg 84:1220–1223, 1997.
21. Ike RW, Arnold WJ, Rothschild EW, Shaw HL, Tidal Irrigation Cooperating Group: Tidal irrigation versus conservative medical management in patients with osteoarthritis of the knee: A prospective randomized study. J Rheumatol 19:772–779, 1992.
22. Jadad A: Randomised Controlled Trials. London, BMJ Books 20–36, 1998.
23. Kjaergard LL, Villumsen J, Gluud C: Reported methodologic quality and discrepancies between large and small randomized trials in meta-analyses. Ann Intern Med 135:982–989, 2001.
24. McCulloch P, Taylor I, Sassako M, Lovett B, Griffin D: Randomised trials in surgery: Problems and possible solutions. BMJ 321:1448–1451, 2002.
25. Merchan EC, Galindo E: Arthroscope-guided surgery versus nonoperative treatment for limited degenerative osteoarthritis of the femorotibial joint in patients over 50 years of age: A prospective comparative study. Arthroscopy 9:663–667, 1993.
26. Moher D, Jones A, Cook DJ, et al: Does quality of reports of randomised trials affect estimates of intervention efficacy reported in meta-analyses? Lancet 352:609–613, 1998.
27. Montori VM, Bhandari M, Devereaux PJ, et al: In the dark: The reporting of blinding status in randomized controlled trials. J Clin Epidemiol 55:42–45, 2002.
28. Moseley JB, O’Malley K, Petersen NJ: A controlled trial of arthroscopic surgery for osteroarthritis of the knee. N Engl J Med 347:81–88, 2002.
29. Nisbett R, Ross L: Human Inference. Englewood Cliffs, NJ, Prentice-Hall 1980.
30. Pollock AV: The rise and fall of the randomized controlled trial in surgery. Theoretical Surg 4:163–170, 1989.
31. Pollock AV: Surgical evaluation at crossroads at the crossroads. Br J Surg 80:964–966, 1993.
32. Popp AJ, Chater N: Extracranial to intracranial vascular anastomosis for occlusive cerebrovascular disease: Experience in 110 patients. Surgery 82:648–654, 1977.
33. Schulz KF, Chalmers L, Grimes DA, Altman DG: Assessing the quality of randomization from reports of controlled trials published in obstetrics and gynecology journals. JAMA 272:125–128, 1994.
34. Schulz KF, Chalmers I, Hayes RJ, Altman DG: Empirical evidence of bias: Dimensions of methodological quality associated with estimates of treatment effects in controlled trials. JAMA 273:408–412, 1995.
35. Schultz KF, Grimes DA: Generation of allocation sequences in randomised trials: Chance, not choice. Lancet 359:515–519, 2002.
36. Schulz KF, Grimes DA: Allocation concealment in randomised trials: Defending against deciphering. Lancet 359:614–618, 2002.
37. Stampfer MJ, Colditz GA: Estrogen replacement therapy and coronary heart disease: A quantitative assessment of the epidemiologic evidence. Prev Med 20:47–63, 1991.
38. The EC/IC Bypass Study Group: Failure of extracranial – intracranial arterial bypass to reduce the risk of ischemic stroke: Results of an international randomized trial. N Engl J Med 313:1191–1200, 1985.
39. The Women’s Health Initiative Investigators: Risks and benefits of estrogen plus progestin in healthy postmenopausal women: Principle results from the Women’s Health Initiative randomized controlled trial. JAMA 288:321–333, 2002.
40. Yusuf S, Collins R, Peto R: Why do we need some large, simple randomized trials? Statistics Med 3:409–420, 1984.

Section Description

Mohit Bhandari, MD, MSc; and Paul Tornetta, III, MD—Guest Editors

© 2003 Lippincott Williams & Wilkins, Inc.