Superstition and belief in magic are thousands of years old, whereas science, with its methods of isolating cause and effect, is a relatively novel development in the course of human history.1 Thousands of years of evolution have resulted in humans being hardwired to view anecdotal associations as causal connections, when, in reality, such associations may be nothing more than pure coincidence. What's more, our mistaken belief in causality can be so powerful that we often ignore contrary evidence.
As surgeons and scientists, we are not immune to these anecdotal fallacies. Recent memories of surgical triumphs or striking complications often lead us to make incorrect assumptions regarding cause and effect. History shows us, however, that decisions made based solely on these assumptions can have critically negative consequences. For example, based on the results of multiple case reports, extracranial-intracranial bypass was once routinely performed to prevent future strokes in patients who had suffered minor strokes. This practice continued until a large, clinical trial demonstrated that patients who had such a bypass had a relative worsening of functional status compared with those who did not undergo the procedure.2 Similarly, patients with severe emphysema were once routinely treated with lung volume reduction surgery until a randomized controlled trial demonstrated that, for many, this practice significantly increased the risk of death.3
These lessons, and others like them, reveal a habit of human cognition—thinking anecdotally comes naturally, whereas thinking “scientifically” does not. Thus, as surgeons, we must challenge the psychological forces that guide our clinical decision making. The best available means of doing so is to subject our surgical interventions to systematic evaluations of cause and effect. This call for higher-level evidence in surgery is not intended to imply that we should ignore what can be learned from astute observation. Instead, to generate more rigorous evidence for causality, we should judge the significance of our observations under controlled conditions and use our powers of observation to continually challenge and reevaluate the assumptions upon which our controlled conditions are based. In this article, we review the basic principles of clinical trial design and discuss the unique challenges of trial design in surgery—all with the goal of compensating for the basic human tendency to assume causality.
DESCRIPTION OF DESIGN
A clinical trial is defined as a prospective experiment involving subjects in whom treatment is initiated for the evaluation of an intervention. Such an intervention may be either prophylactic, diagnostic, or therapeutic in nature.4 Importantly, a clinical trial will contain a control group against which the “intervention group” is compared. At baseline, the control group must be sufficiently similar in relevant respects to the intervention group so that differences in outcome may reasonably be attributed to the action of the intervention.
Nonrandom and random assigned are two general methods for assigning interventions to participants. Although nonrandom assignment methods are simple and easy to implement, they are predictable and may lead to situations that allow for the introduction of bias. By contrast, randomized trials use the play of chance to assign participants to comparison groups. As such, random treatment assignment is the preferred way of allocating participants to control and intervention groups.4 The primary benefit of randomization is that it will eliminate both conscious and unconscious bias associated with the selection of a treatment for a given patient. Although the overall probability of being assigned to a particular group is known (i.e., in a clinical trial evaluating an intervention in two groups, there would be a 50 percent chance of being assigned to either group), there is no way to predict the order of future assignments based on past assignments. The lack of predictability is the key to minimizing bias. As such, methods to allocate participants according to an odd or even date of birth, chart number, and/or date of recruitment are considered faulty and should be avoided.5
The most elementary form of randomization is referred to as simple randomization. One straightforward example of simple randomization is the toss of a coin each time a participant is eligible to be randomized. Using this procedure, approximately one half of all study participants will be assigned to the investigation group and one half to the control group. In practice, instead of tossing a coin to generate a randomization schedule, a random digit table will commonly be used. For larger-scale studies, a more convenient method for producing a randomization might be to use a computerized, random number generator.
Investigators often use two additional procedures—blocking and stratification—with randomization to secure balanced groups. When blocking is used, randomization is conducted in groups or blocks of a certain size. For example, in a trial with a projected enrollment of 100 subjects, blocks of 10 patients, for example, will be randomized at a time to ensure that equal numbers of patients are assigned to the two groups throughout the enrollment process. Blocked randomization is used to avoid a serious imbalance in the number of participants assigned to each group, an imbalance that could occur using simple randomization. In practice, if the trial should be terminated before enrollment is completed, balance will thus exist in terms of the number of participants randomized to each group.
Stratification often accompanies randomization to ensure that key confounding variables are equally distributed between the intervention and control groups. Stratified randomization requires that known prognostic factors be measured before randomization occurs. Using this method, individuals are then stratified or separated according to these variables. For example, a trial evaluating techniques to decrease postoperative infection might stratify or separate patients who are obese versus nonobese and/or diabetic versus nondiabetic. Once stratified, individuals within each stratum would then be randomly assigned to either the intervention or control group. When the groups are then summed over the various strata, the end result is a forced balance of these overall treatment groups according to the factors used to form the strata. Use of stratified randomization should be viewed as an insurance policy against a potential imbalance, and, because it does not require an increase in number of patients needed, it should be routinely used in randomized trials.
Although randomization minimizes the selection bias and therefore minimizes the likelihood of prognostic differences between groups at baseline, its use does not prevent subsequent differences in co-interventions during the trial and/or a biased assessment of the outcome. This is especially true when the outcome or endpoints involve subjective assessment (i.e., postoperative pain, aesthetic result, capsular contracture). The optimal strategy to minimize these potential sources of bias is thus to perform a blinded study. The terms “single,” “double,” or “triple” blinded are often used to indicate how many of groups are masked; yet, these terms can be misleading unless the investigators indicate exactly who was blinded.6 Ideally, patients, clinicians, data collectors, outcome assessors, and analysts should all be blinded to treatment allocation.
TYPE OF CLINICAL QUESTION IT IS BEST SUITED TO ANSWER
Proponents of evidence-based medicine would acknowledge that several sources of evidence inform clinical decision making. This evidence may come from randomized clinical trials, rigorous observational studies, or even anecdotal reports based on expert opinion alone. Hierarchies of evidence have, however, been developed to help describe the quality of evidence that may be found to answer our clinical questions.7 According to this classification, the randomized clinical trial is the most effective way to determine whether a cause-and-effect relationship exists between an intervention and a predefined outcome. Although other study designs can detect associations between an intervention and an outcome, they cannot rule out the possibility that the association was caused by a third factor or confounding variable.8,9
Randomized clinical trials are only justified, however, when genuine equipoise exists among the practice community about the merits of the intervention. This concept states that, ethically speaking, we can only conduct clinical trials in areas of uncertainty and can only continue as long as the uncertainty remains. It follows that at the outset of clinical or surgical research, there should be equipoise regarding the efficacy of a particular intervention. If one treatment arm is known or expected to be substandard or less effective therapeutically, equipoise does not exist.
In some circumstances, a randomized controlled trial may be ethical but unfeasible because of recruitment difficulties. Indeed, once an intervention becomes widespread, it can prove impossible to recruit surgeons who are willing to “experiment” with alternatives. Strong patient preferences may also limit recruitment. Given these constraints, it remains an ideal that evaluation of innovative techniques, procedures, or devices should occur before their uptake into clinical practice.
Although the randomized clinical trial is considered the “gold-standard” study design used to establish the effectiveness of a treatment, it is not a panacea for all clinical questions. The strength of the randomized trial is that conditions are tightly controlled to minimize bias and the risk of arriving at an incorrect conclusion. Because of this, however, their results may lack generalizability. Whereas variations in surgical practice are minimized in randomized trials, observational studies may best determine what role these factors play in the “real world.”
Similarly, in situations in which there is a question of harm causation, a randomized controlled trial may be unnecessary or inappropriate. In 2003, Smith and Pell published an entertaining article, “Parachute Use to Prevent Death and Major Trauma due to Gravitational Challenge.”10 They used the lack of randomized controlled trials in testing parachutes to show that situations still exist where such trials are unnecessary and, more importantly, unethical. When surgical equipoise does not exist or randomized controlled trials are simply not feasible, the efficacy of surgical treatments must instead rely on well-designed observational or cohort studies.
ADVANTAGES AND DISADVANTAGES
Randomized clinical trials are the most stringent way of determining whether a cause–effect relation exists between an intervention and an outcome. Randomization provides one single and uniformly important advantage over all other methodologies—it balances known and unknown prognostic factors between treatment groups. The unpredictability of the process, if not subverted, should prevent systematic differences between comparison groups, provided that a sufficient number of people are randomized. Although it is possible to control for differences between comparison groups in other ways, such as statistical adjustment, this is only possible for factors that are known and can be measured. Randomization is the only means of controlling for both unknown and differences as well as those that are known and measured.
One of the disadvantages of applying the results of randomized controlled trials is the potential for limited generalizability of the study results. For example, restriction of the study to a specific group of relatively homogeneous patients can nearly always minimize the number of patients studied in a clinical trial. The more alike and sensitive a group of patients is to the intervention under investigation, the less other factors can affect the results and the easier it is for the trial to detect the effect of the therapeutic intervention. If the conditions for entry into a randomized clinical trial are so tightly controlled, however, the ability to generalize the results of a study to a broader context is negatively affected.
Unique challenges in the execution of a randomized controlled trial in plastic surgery also include blinding and concealment and financial cost.11–13 For example, difficulties in achieving blinding often arise as one can rarely “blind” surgeons to the procedure performed, and it may be difficult or impossible to blind patients and/or outcome assessors to signs (i.e., scar patterns) that might indicate which treatment was received. Nevertheless, imaginative techniques may make blinding feasible, including assessment by independent or third-party investigators.
Perhaps one of the biggest challenges to performing randomized controlled trials is the cost of funding such a study. Randomized trials often involve a small number of centers with high volumes of patients and dedicated study personnel. Thus, performing such a trial can be quite expensive. Unlike with pharmacologic trials, surgical trials rely mainly on funding from academia. To this end, most of our plastic surgical societies will provide grant support on a competitive basis for high-quality research, including randomized controlled trials, in plastic surgery.
Finally, practical issues, such as standardizing surgical techniques across a group of heterogeneous surgeons and dealing with issues of surgeon expertise, make surgical trials more challenging to perform compared with their medical counterparts. One way to overcome the problem of variability in surgical proficiency is an “expertise-based, randomized controlled trial.” Using this study design, patients are randomized to a “surgeon” rather than to a treatment arm. One of the advantages here is that surgeons will perform only the procedure in which they have experience, avoiding the problem of differential expertise. In addition, expertise-based randomized trials may have greater “real-world” applicability and feasibility than conventional trials.14,15
ESSENTIAL ELEMENTS FOR CONDUCTING A GOOD RANDOMIZED CONTROLLED TRIAL
1. Formulation of the Trial Objective
A randomized controlled trial should be designed to answer one clinical research question. In defining the study objective, the variable to be considered as the primary clinical endpoint will be elucidated. Examples of endpoints in plastic surgical trials might include changes from baseline in quality of life or pain severity scores, complication rates, and/or the relative risks of reaching an endpoint with respect to time. Defining the primary endpoint of the trial is important, as this outcome will be used in the determination of the appropriate sample size.
2. Selection of the Study Population
The study population should, in theory, be representative of the population of patients at large who will eventually receive the said intervention. However, when the patient population includes individuals with a broad range of defining characteristics, the number of patients needed increases, the cost of the study increases, and a greater risk exists that the true treatment effect may go undetected because of the noise added by the heterogeneity of the patient population. Thus, the establishment of inclusion and exclusion criteria is important as a means of adding precision to the study.
3. Sample Size Determination
Having defined the study population, it is necessary to estimate the size of the sample required to allow the study to detect a clinically important difference between the groups being compared. This is performed by means of a sample size calculation. The following factors are taken into account: (1) the background rate of response in the control group; (2) the probability of making statistical errors known as alpha and beta errors, and (3) the smallest true difference between the intervention and control groups that would be clinically valuable.4,5 It is this later requirement that is somewhat artificial and difficult to define. In practice, however, it is usually possible to specify the degree of benefit that the intervention would need to have over the control for it to be worthwhile.
4. Selection of an Appropriate Outcome Measure
In addition to the traditional outcomes, such as surgical complications, the efficacies of many surgical procedures are now being evaluated with respect to their impact on outcomes from a patient perspective. Importantly, these endpoints must be evaluated with scientifically sound, patient-reported outcome measures that are valid, reliable, and sensitive to change.
5. Randomization, Concealment, and Blinding
Practically speaking, simple, blocked, and/or stratified randomization will be performed, often with the aid of a statistician. Whether or not participants or clinicians are blinded to the treatment arm, concealment of the randomization or allocation sequence should be employed, as concealment prevents bias from entering into the process of determining subject eligibility. Interestingly, studies have shown that trials performed without concealment of the allocation sequence report up to 30 percent larger effects compared with studies with concealment.16 An effective method of concealing allocation is to require the person who actually enrolls the patient and/or administers the intervention to contact a research coordinator to obtain the treatment assignment.
Blinding is achieved by making the intervention and the control appear similar in every respect. The optimal strategy to minimize these potential sources of bias is thus to blind patients, clinicians, data collectors, outcome assessors, and analysts to treatment allocation.
6. Use of an Intention-to-Treat Analysis
Participants may be excluded from the analysis for a number of reasons, including nonadherence to the study protocol, poor or missing data, and ineligibility due to competing events. Unfortunately, the value of random assignment is lost if subjects are dropped from the analysis, as the groups can no longer be considered equivalent in terms of known and unknown confounding variables. Thus, the preferred procedure for preventing bias is an intention-to-treat analysis.17
Using this technique, patients will be evaluated within the group to which they were randomized, irrespective of whether they experienced the intended intervention or not. Importantly, this preserves the baseline comparability of the groups of known and unknown confounders. Second, it maintains the statistical power of the original study population. Finally, information gleamed from the trial is more representative of the effectiveness of a treatment under everyday practice conditions.
REPORTING A RANDOMIZED CONTROLLED TRIAL
In an effort to improve the quality of clinical trials performed and avoid problems arising from inadequately performed studies, the Consolidated Standards of Reporting Trials Group was formed. Their group subsequently produced the CONSORT statement, an evidence-based, minimum set of recommendations for reporting randomized controlled trials offering a standard way for authors to facilitate complete and transparent reporting of trial findings as well as the critical appraisal and interpretation of the results. More specifically, the statement comprises a 22-item checklist and a flow diagram, along with some brief descriptive text. The checklist items focus on reporting how the trial was designed, analyzed, and interpreted; the flow diagram displays the progress of all participants through the trial.18–21 The checklist and flow diagram can be found at: www.consort-statement.org.
Recently, a Checklist to Evaluate a Report of a Nonpharmacological Trial (or CLEAR NPT) was recently developed by the CONSORT development team. The development of this checklist is an important advancement in the assessment of surgical trials over the CONSORT statement. Its focus on the reporting of interventions that require some degree of technical skill in their administration places a new value on the unique methodological considerations in conducting and reporting surgical trials.22,23
INTERPRETING AND DERIVING EVIDENCE FROM A RANDOMIZED CONTROLLED TRIAL
1. Are the Results Valid?
Although high-quality randomized controlled trials can provide the strongest evidence of therapeutic effectiveness, the mere description of a study as “randomized” does not allow clinicians to infer validity. Instead, the strength of this evidence hinges on the methodological rigor with which the study was conducted. To select an appropriate therapy for a patient, surgeons need to evaluate the quality of an individual randomized clinical trial.
The Users' Guide Interactive Web site, supported by the Center for Evidence Health, JAMA, and the Archives journals, provides an online tool to guide clinicians in the appraisal and application of randomized controlled trials into their everyday practice. The Web site can be found at http://pubs.ama-assn.org/misc/usersguides.dtl.
2. What Are the Results?
When evaluating the results of a randomized trial, there is often a tendency to equate statistical significance with clinical importance. Importantly, however, this practice can lead to misconceptions regarding the interpretation of the results of trials. For example, in some instances, statistically significant results may not be clinically meaningful; conversely, statistically insignificant results may not eliminate the possibility of clinically important effects. Thus, when evaluating the results of such a trial, surgeons should not only evaluate the “p values” but should consider the magnitude and direction of differences seen.
3. Can the Results Be Applied to My Patient?
When looking to apply the results of a randomized trial to a particular patient, surgeons should ask the following questions:
- Were the study patients similar to my patients?
- Were all clinically important outcomes considered?
- Will the potential benefits of treatment outweigh the potential harms of treatment for my patients?
- Is the treatment feasible in my setting with my expertise?
Answers to these questions combined with one's own clinical experiences as well as the patient's values and preferences must then be used to inform medical decision making.
It is hoped that a better understanding of the components of clinical trials will facilitate the design, implementation, and evaluation of effective studies. By generating more rigorous evidence for the cause and effect of our surgical interventions, we will be able to demonstrate the inherent quality of our surgical specialty and improve the quality of care we provide to our plastic surgical patients.