Share this article on:

Assessing the Validity of Clinical Trials

Akobeng, Anthony K

Journal of Pediatric Gastroenterology and Nutrition: September 2008 - Volume 47 - Issue 3 - p 277–282
doi: 10.1097/MPG.0b013e31816c749f
Invited Review

Clinical trials use scientific methods to evaluate the effectiveness and safety of treatments or other interventions. Trials should have both internal and external validity, and a well-conducted randomised controlled trial is considered to be the most powerful tool for evaluating interventions. Systematic error (bias) and random error could threaten the internal validity of trials, and all efforts should be made to minimise these in the design, conduct, and analysis of studies. Careful attention should be paid to issues such as randomisation, allocation concealment, blinding, and sample size. In an internally valid trial, external validity refers to the ability of the results to be generalised to the “real world” population. Issues to consider in determining the external validity of a study include the setting of the trial, the study population, the types of interventions used, duration of follow-up, and the types of outcome and how they were assessed.

Department of Paediatric Gastroenterology, Booth Hall Children's Hospital, Central Manchester and Manchester Children's University Hospitals, Manchester, UK

Received 7 August, 2007

Accepted 4 February, 2008

Address correspondence and reprint requests to A.K. Akobeng, MD, Department of Paediatric Gastroenterology, Central Manchester and Manchester Children's University Hospitals, Booth Hall Children's Hospital, Charlestown Road, Blackley, Manchester, M9 7AA, UK (e-mail:

The author reports no conflicts of interest.

Back to Top | Article Outline


A clinical trial is a research study in which people are the units of observation and that aims to assess the effectiveness and safety of a drug or other intervention. Clinical trials use scientific methods to investigate clinical questions and aim to find better ways of treating individuals with a specific disease.

Back to Top | Article Outline


After initial laboratory testing, clinical trials of experimental drugs go through 4 phases:

  • Phase 1: This is the first stage in testing a new drug in humans. The primary aim of this phase is to establish the safety of the medicine and to determine an acceptable dose. Typically, a phase 1 trial may involve a relatively small number (about 20–80) of healthy volunteers and/or patients. Subjects are given increasing doses of the drug according to a predetermined schedule (1) and all adverse events are observed and recorded.
  • Phase 2: The treatment is tried on a larger group of people (about 40–100) to determine effectiveness and to further evaluate safety. The second phase is often performed in patients with the disease of interest. A number of study designs ranging from case series to small-scale randomised controlled trials (RCTs) of shorter duration may be used. Phase 2 studies also may examine dose-response relations in patients.
  • Phase 3: After a drug has been shown to be reasonably effective in phase 2 trials, phase 3 trials may begin. These are larger, full-scale RCTs that usually involve more than 200 patients. The aim of phase 3 trials is to further evaluate effectiveness and safety. The drug may be compared with currently used treatments and further information on efficacy and safety collected. A well-designed RCT is regarded as the most powerful trial for evaluating the effectiveness of interventions (2).
  • Phase 4: The final phase of studies involves postmarketing surveillance. Phase 4 trials provide information on how well the drug performs in patients outside clinical trials. These studies involve long-term monitoring of patients to allow the collection of long-term data on risks and benefits.
Back to Top | Article Outline

Randomised Controlled Trial

The RCT is the gold standard for evaluating the effectiveness of interventions. The basic issues that underpin the validity of RCTs also can be applied to the quality assessment of all studies that aim to assess the effectiveness of interventions. For this reason, most of this article will be devoted to the RCT and important issues that one needs to consider when assessing the validity of an RCT.

Figure 1 illustrates the basic structure of an RCT (2). In brief, a sample of the population of interest is randomly allocated to 1 or another intervention and the 2 groups are followed up for a specified period of time. Apart from the interventions being compared, the 2 groups are treated and observed in an identical manner. At the end of the study, the groups are analysed in terms of outcomes defined at the outset. The results from, for example, the treatment A group are compared with results from the treatment B group. Because the groups are treated identically apart from the intervention received, any differences in outcomes are attributed to the trial therapy.

FIG. 1

FIG. 1

Back to Top | Article Outline


The validity of a study can be defined as the extent to which the inference drawn from the study is warranted when account is taken of the study methods, the representativeness of the study sample, and the nature of the population from which it is drawn (3). There are 2 types of study validity: internal validity and external validity.

Back to Top | Article Outline

Internal Validity

A trial is said to have internal validity if the observed differences between groups of patients allocated to different interventions can be correctly attributed to the intervention under investigation. Lack of internal validity adversely influences the quality of the evidence that can be derived from a trial. There are 2 main errors that could threaten the internal validity of a trial and affect the reliable evaluation of the treatment effect (4,5). These are bias (also called systematic error) and random error (also called chance error or statistical error).

Back to Top | Article Outline

Bias (Systematic Error)

Bias occurs when the way a study was designed, conducted, or reported leads to systematic deviations of results from the truth. Inappropriate ways of collecting, analysing, interpreting, and even publishing data could lead to conclusions that are systematically different from the truth (3). The presence of bias may produce differences that may be attributable to factors other than the treatment being investigated, and lead to either an overestimation or an underestimation of the true beneficial or harmful effect of an intervention. There are 4 main sources of systematic bias in clinical trials: selection bias, performance bias, detection bias, and attrition bias (6,7).

Back to Top | Article Outline
Selection Bias

Randomisation is the process of assigning study participants to experimental or control groups at random such that each participant has an equal probability of being assigned to any given group (2). The main purpose of randomisation is to eliminate selection bias and balance known and unknown confounding factors to create a control group that is as similar as possible to the treatment group.

Randomisation is best thought of as involving 3 interrelated but separate processes: allocation sequence generation, allocation concealment, and implementation of the allocated sequence (8). It is important for researchers (and readers of research articles) to pay due attention to all 3 stages because problems at any stage may lead to a collapse of the whole randomisation process.

Back to Top | Article Outline
Allocation Sequence Generation

Generating an unpredictable sequence of allocation is the first stage of randomisation. It is important for authors of RCTs to describe clearly the method that they used to generate the allocation sequence. Appropriate methods for generating the allocation sequence include the use of a table of random numbers or a computer random number generator. Inappropriate methods of assignment include alternating assignment or assignment by date of birth or hospital admission number. These methods have been reviewed elsewhere recently (9).

Back to Top | Article Outline
Allocation Concealment

Generation of a random sequence of allocation by which to assign patients is only the first part of the randomisation process. It is important for all of the people involved in the recruitment of participants, and for the participants themselves, not to have any foreknowledge of the next treatment assignment. The process of preventing researchers, health care professionals, and participants from knowing to which group a subject will be allocated before that participant enters the study is known as allocation concealment (8,10). When allocation concealment is inadequate, it will be possible for researchers, health care professionals, and participants to influence what intervention a particular participant may receive, destroying the randomisation process. For instance, in a study comparing treatment A with a placebo, a researcher may “feel” that treatment A is more effective than the placebo. Thus, when a “more ill” patient enters the trial, the researcher may (even subconsciously) influence enrollment or the selection of patients to ensure that the patient receives treatment A.

Ensuring allocation concealment in clinical trials is important. Studies have shown that in trials in which allocation was not concealed, estimates of treatment effect were exaggerated by about 41% compared with trials that reported adequate allocation concealment (8,10). Authors of RCTs should report how allocation was concealed in their study. Allocation concealment is most securely achieved if an assignment schedule generated using an appropriate randomisation method is administered by someone who is not responsible for recruiting subjects, such as a person based in a central trial office or pharmacy (7). The researchers will contact this remote randomisation facility by telephone when the next participants are about to be randomised. The assignment for that participant is read out to the researchers according to the randomisation schedule.

Back to Top | Article Outline
Allocated Sequence Implementation

This is the third stage of the randomisation process. It is important that once the treatment assignment has been made for a particular patient, it is strictly respected. The researchers, health care professionals, or patients should not be able to alter the assignment or the decision about eligibility. The assigned treatment should be implemented.

Back to Top | Article Outline

Aligned to the issue of selection bias is confounding. Confounding refers to a situation in which a measure of the effect of an intervention is distorted because of the association of the intervention with other factor(s) that influence the outcome under investigation. Consider a study that compared mortality in 2 groups of adults with severe sepsis, in which 1 group received a new antibiotic (group 1) and the other group received the current standard antibiotic (group 2). If patients in group 1 are much younger than patients in group 2, then it is likely that mortality in group 1 will be lower than group 2 merely because of the age difference. Thus, when a lower mortality rate is found in group 1, we cannot tell whether this is as a result of the new treatment or because of the group's younger age. This is because younger age was both associated with receiving the new antibiotic and lower mortality. The effect of the new antibiotic has been mixed together with the effect of younger age, leading to potential bias. The measured effect could therefore have been confounded by age.

An important aim of randomisation is to balance known and unknown confounding factors. In large clinical trials, simple randomisation should lead to a balance between groups in the number of patients allocated to each of the groups and in patient characteristics. However, in smaller studies, this may not be the case. Block randomisation and stratification are strategies that may be used to help ensure balance between groups in terms of the number of participants they contain, and in the distribution of known confounding factors (2,11).

Back to Top | Article Outline
Block Randomisation and Stratification

Block randomisation may be used to ensure a balance in the number of patients allocated to each of the groups in a trial. Participants are considered in blocks of, for example, 4 at a time. Using a block size of 4 for 2 treatment arms (A and B) will lead to 6 possible arrangements of 2 A's and 2 B's (blocks): AABB or BBAA or ABAB or BABA or ABBA or BAAB.

A random number sequence is used to select a particular block, which determines the allocation order for the first 4 subjects. In the same vein, treatment group is allocated to the next 4 patients in the order specified by the next randomly selected block. It is important for the researchers not to know the size of the blocks to ensure allocation concealment.

Although randomisation may help remove selection bias, it does not always guarantee that the groups will be similar with regard to important patient characteristics (2). In many studies, important prognostic factors are known before the study. One way of trying to ensure that the groups are as identical as possible is to generate separate block randomisation lists for different combinations of prognostic factors. This method is called stratification or stratified block sampling. For example, in a trial of enteral nutrition in the induction of remission in active Crohn disease, potential stratification factors may be disease activity (paediatric Crohn disease activity index [PCDAI] ≤25 vs PCDAI >25) and disease location (small bowel involvement vs no small bowel involvement). A set of blocks could be generated for those patients who have PCDAI ≤25 and have small bowel disease; those who have PCDAI ≤25 and have no small bowel disease; those patients who have PCDAI >25 and have small bowel disease; and those who have PCDAI >25 and have no small bowel disease. Stratification is especially useful in small trials to help avert severe imbalances in prognostic factors, but the gain from stratification is probably minimal when the number of participants per group is >50 (9).

With an unbiased randomisation process and a sufficient sample size, the comparison groups should be balanced with respect to known and unknown factors. The goal is to minimise differences between the groups with regard to factors that may obscure or exaggerate a true difference. However, despite randomisation, it sometimes happens that, by chance, 1 group is significantly different from the other with regard to 1 or more potential confounding factors. It is therefore important for authors of RCTs to clearly report the baseline characteristics of the groups and to discuss the steps that were taken to deal with any potential confounding issues. There are multivariable analyses methods that may be used to statistically adjust for the effect of such confounding factors at the analysis stage (12,13), although the application of such methods to RCTs is controversial (13).

Back to Top | Article Outline
Performance Bias and Detection Bias

After randomisation, there will be, for example, 2 groups. One group will receive the study intervention, and the other group will receive the current standard treatment or a placebo. Apart from the interventions being compared, the 2 groups should be treated and observed in an identical manner. Performance bias occurs when there are systematic differences in the care provided to the participants in the comparison groups other than the intervention being studied, whereas detection bias refers to systematic differences between the comparison groups in outcome assessment (7).

There is always a risk in clinical trials that people's perceptions about the advantages of 1 treatment over another may influence outcomes leading to biased results (2). This is particularly important when subjective outcome measures are being used. Patients who are aware that they are receiving what they believe to be an expensive new treatment may report being better than they really are. The judgment of a doctor who expects a particular treatment to be more effective than another may be clouded in favour of what he perceives to be the more effective treatment. When people analysing data know which treatment group was which, there can be a tendency to overanalyse the data for any minor differences that would support 1 treatment.

Knowledge of treatment received also could influence management of patients during the trial, and this can be a source of bias. For example, there could be the temptation for a doctor to give more care and attention during the study to patients receiving what he or she perceives to be the less effective treatment to compensate for perceived disadvantages. To control for these biases, blinding should be performed if at all possible.

Back to Top | Article Outline
Blinding, or Masking

Blinding (also called masking) refers to the practice of preventing study participants, health care professionals, and those collecting and analysing data from knowing who is in the experimental group and who is in the control group, to avoid them being influenced by such knowledge. It is important for authors of RCTs to clearly state whether participants, researchers, or data evaluators were or were not aware of the assigned treatment.

In a study in which participants do not know the details of the treatment, but the researchers do, the term single blind is used. When both participants and data collectors (health care professionals, investigators) are kept ignorant of the assigned treatment, the term double blind is used. When, rarely, study participants, data collectors, and data evaluators such as statisticians all are blinded, the study is referred to as triple blind. When blinding is feasible, great effort should be put into manufacturing a placebo that is similar to the study material or procedure with regards to appearance, taste, smell, etc. Arrangements also should be made to design appropriate and foolproof systems for packaging and labelling, and to develop a system that allows rapid unblinding when untoward adverse events occur (11).

Recent studies have shown that blinding of patients and health care professionals prevents bias. Trials that were not double blinded yielded larger estimates of treatment effects than did trials in which authors reported double blinding (odds ratios exaggerated, on average, by 17%) (14). It should be noted that although blinding helps prevent bias, its effect in doing so is weaker than that of allocation concealment (14). Moreover, unlike allocation concealment, blinding is not always appropriate or possible. For example, in an RCT in which a researcher is comparing enteral nutrition with corticosteroids in the treatment of children with active Crohn disease, it may be impossible to blind participants and health care professionals to assigned intervention, although it may still be possible to blind those analysing the data, such as statisticians.

Back to Top | Article Outline
Attrition Bias

Randomisation ensures that known and unknown baseline confounding factors would balance out in the treatment and control groups. However, after randomisation, it is almost inevitable that some participants will not complete the study for whatever reason. Attrition bias occurs when there are systematic differences between the comparison groups in the loss of participants from the study (7). Participants may be withdrawn from a study because of deviations from protocol or loss to follow-up. Patients may deviate from the intended protocol because of misdiagnosis, noncompliance, or withdrawal (2), whereas loss to follow-up may occur for a number of reasons: patient refusal to participate further, patients becoming simply uncontactable, or clinical decision to stop the assigned intervention (6). Patients excluded from the study are unlikely to be representative of those remaining in the study, and after their exclusion, we can no longer be sure that important baseline prognostic factors in the 2 groups are similar.

It is therefore important that all of the patients allocated to either the treatment or control groups are analysed together as representing that treatment arm whether or not they received the prescribed treatment or completed the study. In other words, the analysis should be performed according to the intention-to-treat principle. Intention-to-treat analyses include all of the patients randomised, whereas per-protocol analyses exclude data from patients who were withdrawn from the study. Clinical effectiveness may be overestimated if an intention-to-treat analysis is not done (15,16). Suggested strategies for dealing with missing data in intention-to-treat analysis have been described elsewhere (16). These include carrying forward the last observed response, and making various assumptions on the missing data in sensitivity analyses. According to the revised Consolidated Standards of Reporting Trials statement for reporting RCTs (17), authors of papers should state clearly which participants are included in their analyses. The sample size per group, or the denominator when proportions are being reported, should be provided for all of the summary information. The main results should be analysed on the basis of intention to treat. When necessary, additional analyses restricted only to participants who fulfilled the intended protocol (per protocol analyses) also may be reported.

Back to Top | Article Outline

Random Error

It is the hope that a well-designed, well-conducted RCT would have eliminated bias (systematic errors). The potential error that still remains after systematic error is eliminated is called random error (18). Random error is due to the variability in the measured data that arise purely by chance. Because of random error, one cannot be certain whether the results obtained in a study are real or arose by chance because it is possible for the play of chance alone to have led to an inaccurate estimate of the treatment effect (5). The most important way of minimizing random error in a study is to recruit a sufficiently large number of patients. The results of large studies are associated with less random error, whereas those of small studies may be subject to a greater degree of random error. There are 2 types of random error that are associated with hypothesis testing. These are known as type 1 and type 2 errors. These errors and issues around sample size and statistical power have been discussed elsewhere (19,20).

Back to Top | Article Outline

External Validity

External validity refers to the extent to which the results of trials provide a correct basis for generalization to other clinical circumstances. In other words, external validity is the ability of the results of trials to be generalised to the “real world” population. Internal validity is a prerequisite for external validity. If the results of a trial are not internally valid, then external validity is irrelevant.

However, the results of an internally valid RCT may be clinically limited if it lacks external validity; in other words, if it cannot be applied to the full spectra of the relevant patient population. Unfortunately, the results of many RCTs are not easily applicable to many patients because the conditions under which the studies were performed cannot be easily replicated in routine clinical practice (21). It is known that the response to treatment in routine clinical practice may be influenced by factors such as the doctor–patient relationship, patient preferences, and the patient's knowledge and understanding of the treatment he or she is receiving. To improve internal validity, these factors are usually eliminated in clinical trials. Procedures such as allocation concealment, blinding, placebo control, and patients (or clinicians) not being allowed to choose their preferred treatment could lead to a distortion of the benefits that the treatment could produce under routine clinical practice (22).

Lack of external validity may lead to clinicians being unwilling to use treatments proven to be effective in trials (22). There are a number of issues that could threaten the external validity of a trial. These include the setting of the trial, selection of participants, characteristics of the participants, types of interventions used and how they were administered, and the way outcomes were measured (22). The representativeness of participants is a particular problem, especially because subject selection in many studies is usually restricted to patients with the most typical features of the disease or those who happen to be available or willing to participate in the trial. To improve generalisability of trials, researchers should make efforts to recruit large numbers of participants from diverse population settings, enroll patients with a broad range of clinical features including those with comorbid conditions, ensure that the way interventions are administered is feasible in routine practice, and assess a broad range of clinically relevant outcomes. Patient values and preferences also should be taken into consideration in determining the external validity of a study.

Back to Top | Article Outline


A clinical trial should have both internal and external validity. Assessment of internal validity involves looking out for sources of bias and random error. External validity is concerned with the applicability of the results of the trial to the real-world population. There are a number of useful checklists that can be used when critically appraising the validity of trials. Good checklists cover areas such as patient selection, randomisation, blinding (when feasible), withdrawals, sample size issues, and applicability to patients. A useful checklist of items that need to be included in the reporting of RCTs has been described elsewhere (17).

Back to Top | Article Outline


1. Pocock SJ. Clinical Trials: A Practical Approach. Chichester, UK: John Wiley & Sons; 1983.
2. Akobeng AK. Understanding randomised controlled trials. Arch Dis Child 2005; 90:840–844.
3. Last JM. A Dictionary of Epidemiology. New York: Oxford University Press; 2001.
4. Keirse MJ, Hanssens M. Control of error in randomized clinical trials. Eur J Obstet Gynecol Reprod Biol 2000; 92:67–74.
5. Stephenson JM, Babiker A. Overview of study design in clinical epidemiology. Sex Transm Infect 2000; 76:244–247.
6. Juni P, Altman DG, Egger M. Systematic reviews in health care: assessing the quality of controlled clinical trials. BMJ 2001; 323:42–46.
7. Assessment of study quality. In: Higgins JPT, Green S (editors), Cochrane Handbook for Systematic Reviews of Interventions 4.2.5, Section 6, Issue 3. Chichester, UK: John Wiley & Sons; 2005.
8. Viera AJ, Bangdiwala SI. Eliminating bias in randomized controlled trials: importance of allocation concealment and masking. Fam Med 2007; 39:132–138.
9. Schulz KF, Grimes DA. Generation of allocation sequences in randomised trials: chance, not choice. Lancet 2002; 359:515–519.
10. Schulz KF, Grimes DA. Allocation concealment in randomised trials: defending against deciphering. Lancet 2002; 359:614–618.
11. Kendall JM. Designing a research project: randomised controlled trials and their principles. Emerg Med J 2003; 20:164–168.
12. Wunsch H, Linde-Zwirble WT, Angus DC. Methods to adjust for bias and confounding in critical care health services research involving observational data. J Crit Care 2006; 21:1–7.
13. Katz MH. Multivariable Analysis: A Practical Guide for Clinicians. Cambridge, UK: Cambridge University Press; 1999.
14. Schulz KF. Assessing allocation concealment and blinding in randomised controlled trials: why bother? Evid Based Nurs 2000; 5:36–37.
15. Gluud LL. Bias in clinical intervention research. Am J Epidemiol 2006; 163:493–501.
16. Hollis S, Campbell F. What is meant by intention to treat analysis? Survey of published randomised controlled trials. BMJ 1999; 319:670–674.
17. Altman DG, Schulz KF, Moher D, et al. The revised CONSORT statement for reporting randomised controlled trials: explanation and elaboration. Ann Intern Med 2001; 134:663–694.
18. Rothman KJ. Epidemiology—An Introduction. New York: Oxford University Press; 2002.
19. Schulz KF, Grimes DA. Sample size calculations in randomised trials: mandatory and mystical. Lancet 2005; 365:1348–1353.
20. Keirse MJ, Hanssens M. Control of error in randomized clinical trials. Eur J Obstet Gynecol Reprod Biol 2000; 92:67–74.
21. Travers J, Marsh S, Caldwell B, et al. External validity of randomized controlled trials in COPD. Respir Med 2007; 101:1313–1320.
22. Rothwell PM. External validity of randomised contreolled trials: to whom do the results apply? Lancet 2005; 365:82–93.

Bias; Error; External validity; Internal validity; Random error; Randomised controlled trial; Systematic

© 2008 Lippincott Williams & Wilkins, Inc.