Aclinical study typically compares two groups of subjects, (eg, a treatment group and an untreated group) with respect to an outcome of interest (eg, disease incidence, recovery from disease, or death). These groups are samples and should be representative of the study's population. Fundamental decisions that must be made when planning a clinical study include determining an ideal sample size. Researchers must keep in mind that samples are used to make inferences about a population. If the population is large and diverse on characteristics of interest, then a large sample size may be required to accurately represent the population. To illustrate this point, Myers and Hansen provide an analogy using the “height of the average American adult.” A small sample would likely give misleading information about the true average height of the population. If the population is similar on characteristics other than the independent variable(s), then a small sample size may adequately represent the true population.1
WHAT DO YOU WANT TO DETECT?
One of the first things to consider when determining sample size is how big a difference in outcome rates is important to detect. That is, does the treatment group need to do twice as well as the untreated group to define a positive result, or is a more subtle difference in outcome rates important? In general, if the difference in outcome rates is likely to be huge, then only a small sample size is needed. For instance, if it is expected that about 90% of the treated group will recover from a disease, as compared to only about 10% of the untreated group, only a small sample size is needed to establish this difference. On the other hand, if the treatment is expected to be only minimally effective, allowing about 15% to recover, as compared to about 10% of the untreated group, this difference in recovery rates is subtle and will require a very large sample size to demonstrate that it is true.
The reason that a much larger sample size is needed to establish a subtle difference in outcome rates is based on a law of statistics stating that the role of “chance” or “random variation” is much greater when sample size is small than when sample size is large. To illustrate this statistical law, consider a fair coin that should come up heads 50% of the time and tails 50% of the time when it is flipped. It should seem intuitive that when flipped 100 times, the coin will come up very nearly 50 heads and 50 tails. If the coin is flipped only 4 times, however, it may come up 2 heads and 2 tails—but it is also possible to get 4 heads and no tails, or vice versa. Flipping the coin 100 times is analogous to having a large sample size, while flipping it only 4 times is like having a small sample size. The problem with a small sample size— whether flipping coins or conducting a clinical study—is that the role of chance is high and true but subtle differences in outcomes between two groups can easily be lost.
Small sample size may be suitable when the expected difference in outcomes is so great that it seems a statistical test is not needed to demonstrate it. If the treatment cures 90% of patients whereas only 10% recover in the untreated group, then only 10 patients in each group may be needed to find convincing results. If the treatment is truly effective, but only minimally so—for example, curing 15% as opposed to 10%—then in order to overcome the role of chance and detect this difference, the sample size must be inflated many times over, requiring a study of hundreds instead of tens of people to establish that the treatment is a little better than no treatment.
HOW FIRM A CONCLUSION DO YOU WANT?
After estimating how big a difference is expected in the outcomes between the two groups, then decide how firmly to establish this difference. The larger the sample size, the firmer can be the conclusion. Of course, a larger sample size brings with it additional time and expense, and more subjects at risk. Therefore, the ideal sample size is “not too small and not too big.” Choose the minimum sample size that will produce a convincing statistical result, without studying many extra (and costly) subjects.
It is also important to avoid making what is called a Type I error when planning sample size. A Type 1 error concludes that there is a real difference in outcome rates when actually no difference exists. Often the Type I error rate, also known as the alpha error rate, is set to 5%. The opposite kind of mistake, a Type II error, is also important to avoid. A Type II error fails to detect a real difference in outcome rates when, in fact, a real difference exists. The Type II error rate, also known as the beta error rate, is often set to 20%. To set the beta error lower tends to inflate the needed sample size enormously. Once the expected degree of difference between the two groups is established (or, alternatively, the minimum, clinically useful amount of outcome difference determined), the alpha and beta levels are used to make “power calculations” that estimate the number of subjects needed for each group. Power is the probability of finding a true difference in your data if a true difference exists (and power happens to be 1-β, in statistical terms).
MEASURES OF PRECISION
After the sample size is calculated, the research conducted, and the study completed, where does all of this lead? Once the data are collected and analyzed, the measure of outcome difference (often, a relative risk), and a measure of its precision are calculated. The usual measures of precision to choose from are the P value and the confidence interval— either can be calculated. If the sample size was estimated wisely, the results will show a relative risk that has a P value less than .05 (the usual cutoff for “statistical significance”) or, alternatively, a 95% confidence interval that does not include the null value of no difference (the null value for the relative risk being 1.0). A study with sufficient sample size and power will produce a precise result that demonstrates a true outcome difference if such a difference exists.
In conclusion, if the expected difference in outcome rates is likely to be substantial (eg, one treatment is much more effective than the other), a small sample size is sufficient because the degree of real difference outweighs the role of chance. But if the expected difference is likely to be subtle, a much greater sample size is needed in order to demonstrate a true difference in outcome rates, as opposed to a difference that is due simply to “noise” or random variation. Use as small a sample size as possible (because each subject enrolled costs time and money) to demonstrate the predetermined difference in outcome rates that is worthwhile detecting.
1. Myers A, Hansen C. Method: selecting and recruiting subjects. In: Experimental Psychology.
4th ed. Pacific Grove, CA: Brooks/Cole Publishing Co; 1997:186-189.