The randomized controlled trial (RCT) is considered the gold standard in clinical investigation, but has not been widely utilized in procedural specialties. In the 1980s, the promotion of RCTs in procedural specialties was referred to as the “fifth horseman of an apocalyptical surgical fundamentalism which has no place in enlightened science.”1 The first recognized RCT, which randomized patients with pulmonary tuberculosis to treatment with streptomycin versus placebo, took place a mere 30 years before this statement.2 When viewed on the longer timescale of medical and surgical therapy (the first documented use of surgical general anesthesia was performed in 1804), the use of randomization in clinical investigation is a relatively recent development.3 Both attitudes toward RCTs and techniques of randomization in procedural specialties have evolved. The goal of this paper is to discuss modern approaches to randomization that may facilitate design and completion of interventional pulmonary (IP) clinical trials.
In the modern era, the term randomization is readily recognized as part of the lexicon used to describe a clinical study. Despite this acceptance, there has been only limited application of randomization in diagnostic and therapeutic procedural studies. There is little data in the IP literature regarding the prevalence of RCTs. However in the leading surgical journals, RCTs represented only 3.4% of the publications over a 10-year period and only one half of those actually randomized a surgical intervention versus an alternative.4 The limited use of the RCT is in part due to specific issues that arise around the design of studies intended to evaluate devices or procedures.
The goal of this article is to identify and discuss common clinical research issues in the IP field related to randomization. It is not intended to replace the many thorough texts and articles available that address randomization fundamentals (an excellent introduction is referenced here and the reader is encouraged to review the CONSORT guidelines for reporting of randomized clinical trials, http://www.consort-statement.org, before pursuing an RCT).5,6 Identification of potential clinical research pitfalls when designing an IP study is critical for study validity. Importantly, addressing these pitfalls through effective and appropriate randomization may result in increased power to detect differences between groups. We will address the follow topics related to randomization in IP studies:
- Why randomize?
- The unit of randomization.
- Methods of randomization.
We have chosen to focus on the critical underpinnings and techniques of randomization and will not address in detail a number of other related issues (eg, allocation concealment, consideration of dropout, primary and prespecified endpoint selection, etc.). Where possible, we have also attempted to highlight pertinent points using examples from the IP literature.
The overarching goal of randomization is to minimize possible biases that may lead to systematic differences between treatment groups, including those confounders that we may not know about, understand, or be able to measure.7 Retrospective and prospective observational studies can provide valuable information but can only control for known confounders that have been accurately measured. RCTs minimize this problem since potential confounders, known and unknown, should be balanced between the groups.
The existing IP literature on the use of moderate versus deep sedation for endobronchial ultrasound-guided transbronchial needle aspiration (EBUS-TBNA) provides a nice case study demonstrating how randomization can control for both known and unknown confounders. In 1 retrospective study, investigators compared the yield of EBUS-TBNA performed under moderate sedation at a single institution versus EBUS-TBNA performed under deep sedation at a different institution by different bronchoscopists.8 They found that the hospital using deep sedation sampled more lymph nodes and had a higher diagnostic yield. However, in the absence of randomization, these data are difficult to interpret. Procedures performed under moderate sedation at one hospital could have had a lower diagnostic yield than procedures performed under deep sedation at the other hospital due to any factor related to the specific hospital at which the procedure was performed. These types of factors are referred to as context variables and may potentially have resulted in confounding (ie, the confounder is related to both the exposure and the outcome). Examples of such system level confounders include patient or procedural selection (eg, at some hospitals EBUS-TBNA may not be performed for sicker patients, at some hospitals specific patient groups may undergo mediastinoscopy instead, etc.), differences in yield by bronchoscopist, the quality of cytopathology services, the technique of obtaining the specimens, and specimen processing. There are numerous potential other confounders in such a study. Although some of these confounders can be controlled for in the analysis (eg, age, sex, race), many factors, such as patient selection, cannot be easily controlled for. The authors rightly pointed out that a randomized trial would be needed to more rigorously control for possible confounding by known and unknown variables.
Indeed, a single center study randomizing patients undergoing EBUS-TBNA to moderate versus deep sedation was subsequently performed. These authors reported no difference in EBUS-TBNA diagnostic yield by sedation type.9 This highlights a common and well-documented phenomenon, specifically discordance between randomized and nonrandomized studies.
Often initial retrospective or prospective observational studies report “positive” findings, but when a subsequent RCT is done the magnitude of the effect may be much less or even absent. This serves to highlight the true value of RCTs. Nonrandomized studies are prone to confounding; without RCTs this confounding can go undetected and lead to mistaken conclusions. This is not to say that the only type of evidence that is useful comes from RCTs. Rather each type of investigative method (ie, retrospective, prospective observational, RCT) has its strengths and weaknesses and may be appropriate under certain circumstances. For important questions where there is sufficient equipoise, RCTs can provide very important insights that challenge existing paradigms in a rigorous manner. The key point for interventional pulmonologists here is not whether moderate or deep sedation is better, but rather that RCTs allow for better control of confounders, both known and unknown, which allows stronger conclusions to be drawn.
However, although RCTs provide excellent control of confounders, this comes at a price. One trade-off is external validity and generalizability. Although a well-conducted RCT allows conclusions to be drawn about the variable of interest, the results, while internally valid, may not be generalizable. In the above example of a RCT comparing moderate versus deep sedation during EBUS-TBNA, the study used only 1 bronchoscopist for all the patients. This adds internal validity to the study in that all the patients had the same skill level in terms of provider and team so only one thing was changing—type of sedation. However, the results might have been different if a less or more skilled team were performing the procedure—hence generalizability often has to be killed in a RCT to arrive at rigorous conclusion. RCTs can thus provide data on the efficacy of an intervention while comparative effectiveness studies are needed to address the generalizability of an intervention.
THE UNIT OF RANDOMIZATION
Randomization is often assumed to refer to the allocation of patients to different groups. However, randomization is just a tool that we deploy within the context of a clinical trial to control for potential confounders. What is randomized does not necessarily have to be patients. Alternative examples of units of randomization include randomization by center, randomization by the episode of disease, or randomization of the order of the intervention.10
Randomization by center is an example of cluster randomization. Cluster randomization refers to the allocation of intact groups of individuals to different interventions. This study design is commonly used by public health researchers, in settings where consenting all patients may be difficult or the intervention is difficult to target to individuals (eg, intervention contamination).11 Such studies are well suited to the investigation of changes to IP training, for instance. This approach has been utilized to assess the effect of restrictive versus flexible duty hour regulations for surgical residents.12 These investigators randomized 117 centers to either a restrictive duty hour regulation (mandating time off between shifts, maximum shift length, etc.) or a flexible duty hour group which only set limits on total days and hours worked. It is worth noting that cluster randomized studies require careful statistical analysis and a clear understanding of the unit of inference. The primary outcome in the above example was a composite of rate of 30-day postoperative death or serious complications, and over 138,000 patients were studied. However, the unit of analysis is actually the center (or cluster) and not the patient. Thus, 117 data points were used for the statistical analysis (not ∼138,000). An excellent discussion of these types of studies is provided by Donner and Klar.13
The episode of disease may also be randomized. This is a particularly effective approach for diseases that are recurrent, particularly when randomizing a patient to a particular intervention may result in multiple data points for each episode for the same patient. For example, asthma exacerbations in children age 1 to 3 were randomized to treatment with azithromycin versus placebo to assess if azithromycin therapy would shorten the duration of the respiratory episode.14 They randomly allocated 158 asthma episodes, in 72 children. Note that the unit of randomization is the episode of asthma exacerbation, thus many of the children in the study received azithromycin during one episode, but not during a subsequent episode, and vice versa.
The unit of randomization may also be the order of intervention. This is a particularly effective approach for diagnostic studies. If there are 2 competing diagnostic modalities, such as 2 peripheral biopsy needles, we could randomize patients to biopsy with either needle A or needle B. This is often mistakenly thought of as the gold standard in IP studies for comparing the efficacy of 2 diagnostic tests. However, there are other study designs with distinct benefits that in some instance can be superior to randomization at the level of the patient. One option is to use a time-series approach discussed above, randomizing the order of the needles but performing each of the diagnostic techniques on the same patient.
An excellent example of randomization at the level of the procedure has been provided by Casal et al.15 These authors performed both EBUS-TBNA and EBUS-TBCNS (transbronchial capillary needle sampling) in 115 patients and randomized the order in which the diagnostic procedure was performed. EBUS-TBCNS is performed identically to EBUS-TBNA with the exception that suction is not applied to the needle. A computer-randomized patients to undergo the first and third passes using EBUS-TBNA and the second and fourth using EBUS-TBCNS or vice versa. They demonstrated no difference in yield, adequacy, or quality of samples. This paired design is a very effective approach for diagnostic studies because it controls for all known and unknown confounders since each patient receives both techniques. These authors also limited potential bias that may be introduced into the study by treating both groups of patients identically up to the point of randomization within the procedure.
An additional benefit of randomization of the order of the intervention is increased power to detect a significant difference between the groups. If we were to perform a study comparing the 2 hypothetical peripheral biopsy needles, needle A versus needle B, and we hypothesized that there was an increase in diagnostic yield from 60% to 80%, we would need to enroll 164 patients (82 per group) to have 80% power to detect a significant difference between the groups (α≤0.05). In contrast, if we were to randomize the order of use of the standard needle versus the novel needle and perform both procedures in the same patients we would only need to enroll 77 total patients to have the same statistical power. This dramatic decrease in sample size is a result of paired testing (ie, performing both tests on a single subject).
The approach just discussed of randomizing at the level of the procedure does however have limitations. It is very efficient for determining if one procedure is more sensitive than another procedure if both can be done on the same patient. But there is a trade-off. Such a study design cannot assess whether there is any difference in risk between procedures, since patients receive both interventions. For the study on EBUS-TBNA and EBUS-TBCNS cited above, the probability of there being a meaningful difference in complication rates was trivial, so randomizing at the level of the procedural order was a reasonable choice. However, if 2 interventions were likely to have both clinically significant differences in diagnostic sensitivity and complication rates (such as pneumothorax), then randomization at the patient level would be more efficient, since clinical decision making requires knowledge not only of marginal benefit but also knowledge of marginal risk. Marginal risk in this case requires randomization at the patient level.
TYPES OF RANDOMIZATION
Randomization controls for known potential confounders by distributing these factors equally between groups.11,16,17 It also controls for unknown confounders since it tends to balance the treatment groups with respect to these unknown variables, since patients are assigned randomly and potential confounders should be balanced between the 2 groups. Note that the underlying fundamental assumption is that the randomization process will result in a roughly equal distribution between the groups. For example, if we want to test 2 alternative treatments, we are hoping that the overall distribution to each of the treatment arms will be roughly 50:50.
But randomization is not a guarantee of equal distributions. If the sample size is large, the law of large numbers suggests that the distribution across groups will be fairly close to the expected value. But what happens if the sample size is small, which it often is in IP? If we do a randomized trial with 4 patients using simple randomization, there is a 2 in 16 chance that we will end up assigning all 4 patients to just one arm or the other, which will result of course in a noninformative study. What if we have budgeted for 20 patients, hoping to get 10 in each group? There is a fairly good chance that if we use simple randomization, we will not get a 10:10 split. We might get 11:9, or if we were very unlucky, 14:6. This problem exists even in the prior example of 2 different peripheral biopsy needles. If we decide to run the trial by randomizing patients to needle A versus needle B (potentially because we needed to better understand the pneumothorax risk, in addition to the diagnostic yield) a simple randomization procedure (randomizer.org) generated 70 patients in the needle A group and 94 in the needle B group, highlighting that randomization imbalance may be important even with moderate sample sizes (Table 1). So how can we avoid this problem of being unlucky and not getting a roughly 50:50 split between treatment arms?
One solution to this is to use permuted blocks. This randomization scheme utilizes a sequence of blocks, with each block containing a prespecified number of patients, in a random order. So for example, in our study of 164 patients, we might use a block size of 4 (eg, 41 blocks). In each block the prespecified number of treatment assignments would be 2 to each arm (ie, 50:50 split). But the order of those 4 patients will vary randomly between blocks (ie, one might be ABAB, another AABB, another BBAA, etc.). The purpose is that at the completion of each block balance is guaranteed but we can never be sure which procedure a given patient will be assigned to. So if we hit our enrollment target of 164, we will be guaranteed an 82:82 distribution (Table 1). If we have to end our study prematurely, perhaps due to budget concerns, the maximum difference between groups in this case will be 2, depending on the order of the last block that was started. The example demonstrates the utility of block randomization using a moderate sample size. Block randomization is a particularly valuable approach in interventional pulmonology since we are commonly dealing with even smaller sample sizes. It does however require careful consideration. If an investigator is not blinded to the treatment (eg, randomizing patients to a procedure vs. none) there is a potential to introduce selection bias into the study. In our example above, an investigator could predict the assignment of the fourth patient in each block given the requirement that there be 2 “A”s and 2 “B”s in each block and thus potentially select “fitter” patients to undergo the procedure. This potential pitfall can be overcome by masking the investigator(s) to the block size or varying the block size during the study. Additional details regarding the use of permuted blocks can be found here.18
This concept of using permuted blocks to ensure balanced randomization can be modified to deal with other, more subtle problems of confounding that are often seen in IP. We said randomization controls for known potential confounders by distributing these factors equally between groups. We are hoping that the randomization process will thus result in a balanced distribution of the treatment groups with respect to known prognostic variables. So let us imagine older age is a risk factor for bad outcome, and there are 2 age groups: “young” and “old.” If our sample size is large, the law of large numbers will work in our favor, ensuring that roughly 50% of young and 50% of old patients are assigned to treatments A and B, respectively. But what if our sample size is small, which it usually is in IP? Suppose we have only 30 patients, 15 in each treatment arm. We are hoping that in each strata of age, the distribution will be roughly 50:50 for treatments A and B. Running a simple randomization procedure results in the following: 13 patients assigned to treatment A, 5 of which are “old” and 17 patients assigned to treatment B, 10 of which are “old.” The poor prognostic factor of “old” is twice as prevalent in treatment group B. This results in confounding of the measured effect of the treatment, since in this study, age is now associated with both the treatment assignment and the outcome. Therefore our estimate of the effects of treatment on outcome will be inaccurate.
We can use stratified randomization to address this problem. Stratified randomization uses permuted blocks that are based on known prognostic variables. Randomization is then performed separately within each stratum. Note that the stratum in this example is a patient level variable (ie, it is a characteristic of individual patients).
In the hypothetical case of our investigation of 2 peripheral biopsy needles, the standard needle A and our new tool needle B, we would like to compare them in a study at a single center where multiple bronchoscopists practice. Let us assume that the null hypothesis is true and that, in truth, there is no difference in diagnostic yield between the 2 needles. Let us also assume that one of the physicians, bronchoscopist 1, has a diagnostic yield of 80% for peripheral needle biopsies while her colleague has a 60% diagnostic yield. If we use simple randomization (http://www.randomizer.org) and do not account for the potential confounder of differences in diagnostic yield by bronchoscopist, we get the results demonstrated in Table 2. Note that more patients allocated to needle A had a procedure performed by bronchoscopist 1 and that a lower percentage of patients assigned to needle B had the procedure performed by bronchoscopist 1. If we use the randomization assignments in Table 1 and the known diagnostic yields for each of the bronchoscopists a simple comparison of probabilities would yield a P value of 0.0198. This would lead us to incorrectly reject the null hypothesis and erroneously conclude that needle B is inferior, when in truth there is no difference in the needles. Also note that in this case, the stratum is not a characteristics of individual patients, but rather it is a context variable (ie, the bronchoscopist).
However, if we were to perform stratified randomization using a block size of 4 and a prespecified number of treatment assignments of 2 to each arm (ie, 50:50 split), with the order varying randomly within each strata, we could correct this problem (Table 2). This is equivalent to block randomization within a known strata. Using this procedure, the first patient enrolled may be assigned to bronchoscopist 1 and then will be randomized to a needle based on a block that has been predetermined for that bronchoscopist. The next patient may be randomized to the strata of bronchoscopist 2 and then will be assigned to a needle based on a block that has been predetermined, but it is a different block than the one for the prior patient—it is a block just for the patients of bronchoscopist 2. The goal is to ensure that there is balance of treatment groups with respect to each strata of the prognostic variable. Assigning patients in this manner would lead us to the correct statistical inference that there is no significant difference in the needles since the confounder of diagnostic yield by bronchoscopist has been balanced between the needle groups. Note that stratified block randomization can be used for both patient level variables (eg, age) and context variables (eg, the provider).
More common than stratifying by operator, is stratification by center thereby accounting for operator differences as well as differences in other aspects of care (ie, operator, anesthesia administration, pathologic interpretation, etc.) between hospitals.19,20 This technique was used in the RESET trial, which investigated the efficacy of endobronchial coil placement versus medical therapy for patients with severe emphysema and hyperinflation.21 These authors used stratified randomization to allocate 47 patients within 3 different centers, thereby balancing potential prognostic factors associated with care at each of the 3 centers.
The primary drawback of stratified randomization is that of using too many strata (eg, overstratification). This may result in imbalances in randomization due to incomplete blocks.22 Thus investigators are encouraged to remain parsimonious in selecting stratification factors. A conservative approach is to use fewer strata than N/4B, where N is the total sample size and B is the block size.22 The number of strata (eg, the total number of levels of all variables stratified for) may also be estimated using a Poisson distribution given a sample size and a risk of observing fewer than the minimum number in a stratum.23 Other techniques such as minimization and adaptive designs may also be deployed to address the goal of balancing prognostic factors.24 The key point for interventional pulmonologists is that the use of randomization techniques such as permuted block and stratified randomization can be powerful tools, significantly improving the chances of arriving at the correct statistical inference.
Randomization strives to accomplish the goal of balancing potential confounders, even those that are unmeasured, between groups being compared. Careful consideration of the unit of randomization can allow for design of more efficient studies, potentially requiring fewer subjects. The reliability of randomization can be protected through block randomization. This technique can be extended by stratified randomization to account for potential uneven distribution of confounders. In IP, this can also be used to account for differences in practice between operators or centers. When applied thoughtfully randomization can be a powerful tool leading to studies with better validity and more robust results.
1. Byer A. The practical and ethical defects of surgical randomised prospective trials. J Med Ethics. 1983;9:90–93.
2. Marshall G, Blacklock J, Cameron C, et al. Streptomycin treatment of pulmonary tuberculosis: a Medical Research Council investigation. BMJ. 1948;2:769–783.
3. Izuo M. Medical history: Seishu Hanaoka and his success in breast cancer surgery under general anesthesia two hundred years ago. Breast Cancer. 2004;11:319–324.
4. Cook JA. The challenges faced in the design, conduct and analysis of surgical randomised controlled trials. Trials. 2009;10:9.
5. Stanley K. Design of randomized controlled trials. Circulation. 2007;115:1164–1169.
6. Stanley K. Evaluation of randomized controlled trials. Circulation. 2007;115:1819–1822.
7. Altman DG. Randomisation. BMJ. 1991;302:1481–1482.
8. Yarmus LB, Akulian JA, Gilbert C, et al. Comparison of moderate versus deep sedation for endobronchial ultrasound transbronchial needle aspiration. Ann Am Thorac Soc. 2013;10:121–126.
9. Casal RF, Lazarus DR, Kuhl K, et al. Randomized trial of endobronchial ultrasound-guided transbronchial needle aspiration under general anesthesia versus moderate sedation. Am J Respir Crit Care Med. 2015;191:796–803.
10. Sedgwick P. Clinical trials: units of randomisation. BMJ. 2014;348:g3297.
11. Sedgwick P. Treatment allocation in trials: cluster randomisation. BMJ. 2014;348:g2820.
12. Bilimoria KY, Chung JW, Hedges LV, et al. National Cluster-Randomized Trial of Duty-Hour Flexibility in Surgical Training. N Engl J Med. 2016;374:713–727.
13. Donner A, Klar N. Pitfalls of and controversies in cluster randomization trials. Am J Public Health. 2004;94:416–422.
14. Stokholm J, Chawes BL, Vissing NH, et al. Azithromycin for episodes with asthma-like symptoms in young children aged 1-3 years: a randomised, double-blind, placebo-controlled trial. Lancet Respir Med. 2016;4:19–26.
15. Casal RF, Staerkel GA, Ost D, et al. Randomized clinical trial of endobronchial ultrasound needle biopsy with and without aspiration. Chest. 2012;142:568–573.
16. Roberts C, Torgerson D. Randomisation methods in controlled trials. BMJ. 1998;317:1301.
17. Sedgwick P. Treatment allocation in trials: stratified randomisation. BMJ. 2015;350:h978.
18. Sedgwick P. Treatment allocation in trials: block randomisation. BMJ. 2014;348:g2409.
19. McLeod RS. Issues in surgical randomized controlled trials. World J Surg. 1999;23:1210–1214.
20. McCulloch P, Taylor I, Sasako M, et al. Randomised trials in surgery: problems and possible solutions. BMJ. 2002;324:1448–1451.
21. Shah PL, Zoumot Z, Singh S, et al. RESET trial Study Group. Endobronchial coils for the treatment of severe emphysema with hyperinflation (RESET): a randomised controlled trial. Lancet Respir Med. 2013;1:233–240.
22. Kernan WN, Viscoli CM, Makuch RW, et al. Stratified randomization for clinical trials. J Clin Epidemiol. 1999;52:19–26.
23. Silcocks P. How many strata in an RCT? A flexible approach. Br J Cancer. 2012;106:1259–1261.
24. Altman DG, Bland JM. Treatment allocation by minimisation. BMJ. 2005;330:843.