Journal Logo

Reporting of Outcomes in Orthopaedic Randomized Trials: Does Blinding of Outcome Assessors Matter?

Poolman, Rudolf W., MD1; Struijs, Peter A.A., MD, PhD2; Krips, Rover, MD, PhD2; Sierevelt, Inger N., MSc2; Marti, René K., MD, PhD2; Farrokhyar, Forough, MPhil, PhD1; Bhandari, Mohit, MD, MSc, FRCSC1

doi: 10.2106/JBJS.F.00683
Scientific Articles
Free
Supplementary Content

Background: Randomization, concealment of treatment allocation, and blinding are all known to limit bias in clinical research. Nonsurgical studies that fail to meet these standards have been reported to inflate the differences between treatment and control groups. While surgical trials can rarely blind surgeons or patients, they can often blind outcome assessors. The aim of this systematic review was threefold: (1) to examine the reporting of outcome measures in orthopaedic trials, (2) to determine the feasibility of blinding in published orthopaedic trials, and (3) to examine the association between the magnitude of treatment differences and the blinding of outcome assessors.

Methods: We identified and reviewed thirty-two randomized, controlled trials published in The Journal of Bone and Joint Surgery (American Volume) in 2003 and 2004 for the appropriate use of outcome measures. These trials represented 3.4% of all 938 studies published during that time-period. All thirty-two trials were reviewed by two authors for (1) the outcome measures used and (2) the blinding of outcomes assessors. We calculated the magnitude of the treatment effect of the use of blinded compared with unblinded outcome assessors.

Results: Ten (31%) of the thirty-two randomized controlled trials used a modified outcome instrument. Of the ten trials, four failed to describe how the outcome instrument was modified. Nine of the ten articles did not describe how the modified instrument was validated and retested. Sixteen of the thirty-two randomized controlled trials did not report blinding of outcome assessors when blinding would have been possible. Among the studies with continuous outcome measure, unblinded outcomes assessment was associated with significantly larger treatment effects than blinded outcomes assessment (standardized mean difference, 0.76 compared with 0.25; p = 0.01). Similarly, in the studies with dichotomous outcomes, unblinded outcomes assessments were associated with significantly greater treatment effects than blinded outcomes assessments (odds ratio, 0.13 compared with 0.42; p < 0.001). The ratio of odds ratios (unblinded to blinded outcomes assessment) was 0.31, suggesting that unblinded outcomes assessment was associated with a potential for exaggeration of the benefit of the effectiveness of a treatment in our cohort of studies.

Conclusions: In future orthopaedic randomized controlled trials, emphasis should be placed on detailed reporting of outcome measures to facilitate generalization and the outcome assessors should be blinded, when possible, to limit bias.

1 Orthopaedic Research Unit, Division of Orthopaedic Surgery, McMaster University, Hamilton Health Sciences-General Hospital, 237 Barton Street East, 7 North, Suite 727, Hamilton, ON L8L 2X2, Canada. E-mail address for R.W. Poolman: Poolman@trauma.nl

2 Department of Orthopaedic Surgery, OrthoTrauma Research Center Amsterdam, Academic Medical Center, University of Amsterdam, G4 Noord, P.O. Box 22660, 1100 DD Amsterdam, The Netherlands

Randomized controlled trials represent the highest level of evidence for a surgical therapy1. Several reports have identified factors associated with bias in the conduct and reporting of randomized trials in the medical literature2-8. These include concealment of randomization and blinding of physicians, patients, outcome assessors, and data analysts2-8. If differences between treatments are modest, bias can distort the truth9.

Lack of blinding in medical randomized controlled trials has been associated with increased magnitudes of observed treatment effects2,5,7,10-12. Blinding is a methodological safeguard and is often confused with other methodological precautions, such as concealment of allocation during the process of creating comparison groups12. Allocation in a trial is concealed when investigators cannot beforehand determine the allocated treatment of the next patient enrolled into their study. Allocation concealment is necessary to prevent selection bias, whereas blinding is important to prevent detection bias, i.e., a biased assessment of outcome5.

Unlike pharmaceutical trials, surgical trials can never blind the surgeon to the type of intervention. Thus, safeguards to prevent bias in surgical trials include concealment of allocation and blinding of patients and outcome assessors9. Boutron et al. reported that the feasibility of blinding differs between pharmaceutical and nonpharmaceutical trials13.

In surgical trials, reporting unbiased, clinically important differences in treatment effect ideally requires blinded outcomes assessment and the use of validated outcome instruments14,15. Despite the use of the best-validated outcome instruments, the measurement of treatment effect can still be biased if differences between treatment groups are small and methodological safeguards are not applied11,16.

The correct use of outcome measures was recently discussed in an editorial by Zarins in this journal17. In the last decades, an effort has been made to design patient-oriented outcome measures, but these new measures have not always been appropriately validated18-20.

The aim of this systematic review was threefold: (1) to examine the reporting of outcome measures in orthopaedic trials, (2) to determine the feasibility of blinding of outcome assessors in published orthopaedic trials, and (3) to examine the association between the magnitude of treatment differences and blinding.

We hypothesized that outcome instruments are often modified and authors do not report validation of the modified outcome instrument. Furthermore, we hypothesized that studies describing unblinded outcome assessors were more likely to report larger treatment effects than those involving blinded outcome assessors.

Back to Top | Article Outline

Materials and Methods

Study Design

We conducted a systematic review to describe the reporting of outcome measures and the conduct and feasibility of blinding in randomized controlled trials published in The Journal of Bone and Joint Surgery (American Volume) in 2003 and 2004. We chose The Journal since it is regarded as the highest-impact general orthopaedic journal. Our review comprised only randomized controlled trials, since they are designed to detect clinically important changes with limited bias and are considered to provide the highest level of evidence1.

Back to Top | Article Outline

Eligibility Criteria

Two authors (R.W.P. and R.K.) manually searched all issues of The Journal from January 2003 through December 2004. Eligible studies included those reported as randomized trials of therapeutic interventions that involved human subjects. Searches were conducted in duplicate, and any disagreements were resolved by consensus of three authors (R.W.P., R.K., and M.B.).

Back to Top | Article Outline

Study Demographic Information

The relevant data from each of the eligible studies were abstracted by one investigator (R.W.P.) and were rechecked for accuracy by a second investigator (P.A.A.S.). The data included (1) the first author (a surgeon; a physician, but not a surgeon; or an epidemiologist); (2) citation of statistical support or methodological support by a department of clinical epidemiology, statistics, or public health; (3) year of publication; (4) total sample size; (5) number of centers; (6) name of the intervention; (7) trial type, i.e., surgery, drug trial (categorized as injection, oral, or topical), postoperative management, and externally applied energy (shock wave or ultrasound); (8) body region (upper extremity, long bones of the lower extremity, spine, hip and knee, foot and ankle, deep venous thrombosis, or other); (9) financial support (yes or no); (10) direction of results (positive, if the findings of the randomized trial were significant, or negative, if they were not significant); and (11) trial reported according to the CONSORT (Consolidated Standards of Reporting Trials) statement21 (yes or no).

Back to Top | Article Outline

Study Aim 1: Evaluation of Outcome Reporting

Outcome Measure Identification

All identified randomized controlled trials were retrieved and reviewed for the types of outcome measures used in the study by two reviewers (R.W.P. and P.A.A.S.), and they were checked for accuracy by a third reviewer (R.K.). We categorized the outcome measures according to the following criteria described by Boutron et al.13: (1) “Patient-reported outcomes” (e.g., pain and disabilities), when the patient is the outcome assessor. (2) “Outcomes that suppose a contact between patients and outcome assessors” (e.g., clinical examination [blood pressure], clinical test [walking speed and stair-climbing], and ultrasound examination). (3) “Outcomes that do not suppose a contact between patients and outcome assessors” (e.g., radiography and magnetic resonance imaging). (4) “Clinical events and therapeutic outcomes that will be determined by the interaction between patients and care providers” (e.g., cointerventions, length of hospitalization, treatment failure, and surgery), in which the care provider is the outcome assessor. (5) “Clinical events and therapeutic outcomes that will be assessed from data on the medical form” (e.g., death linked to myocardial infarction or indication for arthroplasty from clinical and radiographic data). (6) “Composite outcomes,” i.e., those that require each outcome to be assessed separately.

Back to Top | Article Outline

Outcome Instrument Validation and Appropriate Use

We assessed the studies with regard to the following criteria: (1) Did the study describe a modification of an outcome instrument? (2) Did the authors provide a detailed description of the modification in the manuscript when an outcome instrument was modified? (3) Did the authors provide a detailed description of the validation process after modification of the outcome instrument in their study?

Back to Top | Article Outline

Study Aim 2: Quality of Reporting on the Methodological Safeguard of Blinding

We examined the blinding process for all included randomized controlled trials. We scored the use of blinding in the report on the basis of three categories. (1) “Clearly blinded” when blinding was reported. (2) There was no report of blinding status. When blinding was not conducted or not reported, we further examined whether blinding would have been possible (feasible) on the basis of the characteristics of the study. This resulted in a category defined as: “Not blinded or blinding status not described, but it was possible to blind.” (3) The report stated that “blinding was impossible” because of the characteristics of the intervention or the outcome used in the trial.

This blinding process was scored for treatment providers, patients, outcome assessors, and data analysts.

Details about the blinding were reported for different trial types. We categorized the trials into the following subgroups: surgical trial, drug trial (injection, oral, or topical), postoperative management, physical therapy trial, and externally applied energy trial.

Back to Top | Article Outline

Study Aim 3: Effect of Blinding on the Reported Treatment Effects

We compared the magnitude of the treatment effects for studies that reported outcome blinding with those that did not. We grouped the studies that were categorized as “not blinded or blinding status not described, but it was possible to blind” and “blinding was impossible” into the unblinded group.

For dichotomous outcome measures, we compared pooled odds ratios (95% confidence intervals) across studies with and without blinding. For continuous variables (outcome scales), we converted the reported mean differences across treatments to an effect size, or standardized mean difference (the treatment mean value minus the comparison mean value divided by the pooled standard deviation), and compared the effect size across blinded and unblinded outcomes. We used the same method to calculate the magnitude of the treatment effect for studies that described the concealment of treatment allocation and those that did not.

Back to Top | Article Outline

Statistical Analysis

Descriptive results were presented for the demographic characteristics of the studies, evaluation of outcomes, and details with regard to blinding and trial type. Data were analyzed with SPSS statistical software package (version 13.0; SPSS, Chicago, Illinois).

Back to Top | Article Outline

Ensuring the Accuracy of the Blinding Rating

We measured agreement between the reviewers for the assessment of the blinding of outcome assessors with use of a kappa statistic22. Landis and Koch suggested criteria for the interpretation of agreement, with 0 to 0.2 representing slight agreement; 0.21 to 0.40, fair agreement; 0.41 to 0.60, moderate agreement; and 0.61 to 0.80, substantial agreement. A value of >0.80 is considered almost perfect agreement22. Regardless, if two reviewers disagreed, we attempted to come to a consensus after carefully reading the articles a second time in a consensus meeting. When discrepancies persisted despite a consensus meeting, a third reviewer was asked for an opinion on the specific item to reach final consensus. This method of quality assessment is commonly used in Cochrane reviews. All reviewers (R.W.P., P.A.A.S., R.K., and M.B.) were well trained in quality assessments, all had completed a Cochrane review course, and all had coauthored Cochrane systematic reviews of randomized trials.

We used the chi-square test to calculate the relationship between outcome assessor blinding and the direction of the results. We used a p value of <0.05 to represent significance. All tests of significance were two-tailed.

Back to Top | Article Outline

Magnitude of the Treatment Effect in the Studies

We used the computer program Review Manager (RevMan, version 4.2 for Windows; The Nordic Cochrane Centre, The Cochrane Collaboration, Copenhagen, Denmark, 2003) to calculate the magnitude of the treatment effect for the studies that described blinded outcome assessors and for the studies in which blinding either was not described or was not possible. For continuous data, we described treatment effect as the standardized mean difference. The standardized mean difference is the difference in means divided by the standard deviation. This standard deviation is the pooled standard deviation of the outcomes of participants across the whole trial. The standardized mean difference has the important property that its value does not depend on the measurement scale (http://www.cochranenet.org/openlearning/HTML/modA1-4.htm). Thus, we used a standardized mean difference to convert all outcomes to a common scale, measured in units of standard deviations. For dichotomous data, we calculated the odds ratios to describe the magnitude of the treatment effect. We calculated the odds ratios on the premise of preventing a bad outcome. For the standardized mean difference, we used the fixed effect inverse variance model. For the odds ratios, we used fixed effect assumption and the Mantel-Haenszel risk ratio. Our aim was to describe the magnitude of the treatment effect. We did not aim to compare data or to pool for meta-analysis; therefore, we were able to use the fixed effect model. Data were presented in figures with 95% confidence intervals. Ratios of odds ratios were calculated as described previously5,7,23. A ratio of odds ratios of <1.0 for outcome assessor blinding indicates that trials with unblinded outcomes assessments yielded larger (exaggerated) estimates of treatment effects than blinded outcomes assessments, compared with the reference group7. Conversely, a ratio of odds ratios of >1.0 indicates association with smaller treatment effects7. The outcome measures used to calculate treatment effect are listed in the Appendix.

Fig. 1

Fig. 1

Back to Top | Article Outline

Sample Size

Our study sample size included all randomized trials published in The Journal from January 2003 through December 2004. We required at least fifty patients per group (the studies with blinded outcome assessment compared with those with unblinded outcome assessment) across all thirty-two eligible randomized trials to provide sufficient study power (alpha = 0.05 and beta = 0.20) to detect large differences in treatment effects (odds ratios) between studies with blinded and unblinded outcome measures. All tests were two-tailed, and we considered a p value of <0.05 to be the threshold for significance.

Back to Top | Article Outline

Results

Study Demographic Information

We identified 938 studies in The Journal from January 2003 through December 2004. Of those studies, thirty-two (3.4%) were randomized controlled trials (see Appendix). The first author was a surgeon in thirty-one studies (97%) and a physician who was not a surgeon in the remaining study (3%). In three randomized trials, at least one of the authors of each study had training in biostatistics (MPH, MSc, or PhD) or was affiliated with a department of statistics, public health, or clinical epidemiology. The thirty-two randomized trials included a total of 3608 patients, with sample sizes ranging from twenty to 474 patients per randomized controlled trial. Six of the studies were performed in two or more centers, eleven focused on interventions related to the treatment of degenerative joint disease, and seven focused on fractures. Five studies included problems affecting the upper extremity; six, the foot and ankle; and nine, the knee. Four randomized controlled trials were reported according to the CONSORT statement21 (Table I).

TABLE I - Characteristics of the Thirty-two Trials
Characteristic No. (%) of Studies
First author
    Surgeon 31 (97)
    Physician, not a surgeon 1 (3)
Epidemiology affiliation
    Yes 3 (9)
    No 29 (91)
Region of body
    Upper extremity 5 (16)
    Lower-extremity long bones 2 (6)
    Spine 2 (6)
    Hip 5 (16)
    Knee 9 (28)
    Foot and ankle 6 (19)
    Soft tissue 2 (6)
    Deep venous thrombosis 1 (3)
Number of centers
    Single 26 (81)
    Multiple 6 (19)
Funding received
    Yes 17 (53)
    No 15 (47)
Direction of results
    Positive 24 (75)
    Negative 8 (25)
CONSORT*
    Yes 4 (13)
    No 28 (87)
*
CONSORT = Consolidated Standards of Reporting Trials.

Back to Top | Article Outline

Study Aim 1: Evaluation of Outcome Reporting

Outcome Measure Identification

A total of seventy-nine different outcome measures were reported 147 times in the thirty-two randomized controlled trials. The seventy-nine outcomes were classified according to the criteria described by Boutron et al.13 and were further categorized into the following subgroups. For the sixteen patient-reported outcomes, the most frequently used measure was the visual analogue scale (twelve trials; 38%) followed by the Short Form-36 (SF-36) (four trials; 13%). For the fifty-one outcomes that suppose a contact between patients and outcome assessors, the most frequently used measure was range of motion (nine trials; 28%) followed by other types of clinical examination (eight trials; 25%). For the eight outcomes that do not suppose a contact between patients and outcome assessors, radiographs were the measure used in twenty randomized controlled trials (63%). For the four clinical events and therapeutic outcomes that are determined by the interaction between patients and care providers, reoperation and the use of pain medication were the measures described in two trials. We were not able to identify clinical events and therapeutic outcomes that are assessed from data on the medical chart or composite outcomes that require each outcome to be assessed separately. For a detailed description of the outcomes see the Appendix.

Back to Top | Article Outline

Outcome Measure Validation and Appropriate Use

Ten (31%) of the thirty-two randomized controlled trials used a modified outcome instrument. Of the ten trials, four failed to describe how the outcome instrument was modified. Nine of the ten articles did not describe how the modified instrument was validated and retested.

Back to Top | Article Outline

Study Aim 2: Quality of Reporting on the Methodological Safeguard of Blinding

The reviewers had substantial to almost perfect agreement in scoring the blinding of treatment providers (kappa, 0.85), patients (kappa, 0.90), outcome assessors (kappa, 0.84), and data analysts (kappa, 0.73).

Five randomized controlled trials (16%) did not blind or did not describe blinding of treatment providers in trials in which blinding was possible. In twenty-three trials (72%), blinding was impossible; the majority of these studies (twenty-one; 66%) were surgical trials in which blinding of the treatment provider was impossible. Fifteen trials (47%) did not blind or did not describe blinding of patients, when blinding was possible. In eleven trials, blinding of patients was impossible. For the surgical trials, patients were clearly blinded in three, blinding was possible but not reported or was not done in nine trials, and it was impossible to blind patients in nine trials. All studies could have blinded the data analysts; however, twenty-three trials did not report blinding or did not blind the data analysts. The remaining nine trials (28%) blinded the data analysts.

Of the thirty-two randomized controlled trials, fourteen (44%) clearly blinded the outcome assessors. Sixteen trials did not describe blinding of the outcome assessors, when blinding of the outcome assessors was possible. In two trials, which were both surgical trials, it was impossible to blind the outcome assessors. Of the remaining surgical trials, seven trials clearly blinded the outcome assessors and twelve trials did not blind the outcome assessors or did not report blinding although it was possible. Data with regard to blinding and trial type are summarized in Table II.

TABLE II - Blinding in Thirty-two Randomized Controlled Trials Published in The Journal of Bone and Joint Surgery in 2003 and 2004
Use of Blinding (No. of Trials)
Treatment Providers Patients Outcome Assessors Data Analysts
Type of Trial No. of Trials in Subgroups Total No. of Trials Clearly Blinded Not Blinded or Not Described but Was Possible Impossible to Blind Clearly Blinded Not Blinded or Not Described but Was Possible Impossible to Blind Clearly Blinded Not Blinded or Not Described but Was Possible Impossible to Blind Clearly Blinded Not Blinded or Not Described but Was Possible Impossible to Blind
Surgical trial 21 0 0 21 3 9 9 7 12 2 4 17 0
Drug trial 6 2 4 0 2 4 0 5 1 0 4 2 0
    Injection 3 1 2 0 1 2 0 3 0 0 3 0 0
    Oral 2 0 2 0 1 1 0 1 1 0 0 2 0
    Topical 1 0 1 0 0 1 0 1 0 0 1 0 0
Postop. management 2 1 1 0 0 1 1 1 1 0 0 2 0
Physiotherapy 1 0 0 1 0 0 1 0 1 0 0 1 0
Externally applied energy 2 1 0 1 1 1 0 1 1 0 1 1 0
Total no. (%) of all randomized controlled trials 32 (100) 4 (13) 5 (16) 23 (72) 6 (19) 15 (47) 11 (34) 14 (44) 16 (50) 2 (6) 9 (28) 23 (72) 0 (0)

Back to Top | Article Outline

Study Aim 3: The Effect of Outcome Assessor Blinding on the Reported Treatment Effects

In studies with continuous outcome measures, the treatment effect was larger in studies with unblinded outcome assessors. The effect size (standardized mean difference) for three studies describing a blinded outcome assessor was 0.25 (95% confidence interval, –0.06 to 0.56), whereas the effect size was 0.76 (95% confidence interval, 0.57 to 0.96) for eight studies with an unblinded outcome assessor or unreported blinding (Fig. 1). The difference in effect magnitude was significant (p = 0.01).

In the studies that described a dichotomous outcome measure, blinding of the outcome assessors in ten studies (odds ratio, 0.42; 95% confidence interval, 0.33 to 0.54) was associated with a significantly lower treatment effect than that associated with unblinded outcome assessments in eleven studies (odds ratio, 0.13; 95% confidence interval, 0.09 to 0.18) (p < 0.001) (Fig. 2). This translated to relative risk reductions of 38% for blinded outcome assessments compared with 71% for unblinded outcome assessments (a difference of 33%). The ratio of odds ratios was 0.31 (95% confidence interval, 0.20 to 0.47), indicating an exaggerated treatment effect when outcome assessors were not blinded. Figure 3 represents a comparison of this ratio of odds ratios with those of five other studies that described relative odds associated with blinding of outcome assessment5,7,23-25.

Fig. 2

Fig. 2

The effect size (standardized mean difference) for the four studies describing concealment of treatment allocation was 0.44 (95% confidence interval, 0.13 to 0.74) compared with 0.68 (95% confidence interval, 0.49 to 0.87) for the seven studies with unconcealed treatment allocation. In the studies describing a dichotomous outcome measure, the treatment effect in eight trials with concealment of treatment allocation (odds ratio, 0.32; 95% confidence interval, 0.23 to 0.45) was lower than that in thirteen trials with unconcealed treatment allocation (odds ratio, 0.26; 95% confidence interval, 0.20 to 0.33). The ratio of odds ratios was 0.81 (95% confidence interval, 0.53 to 1.24), indicating that there was no significant difference for the available data.

Fig. 3

Fig. 3

Our evaluation of the direction of study results, positive or negative, was underpowered. Of the sixteen trials in which the blinding status was not described or the outcome assessors were not blinded, twelve noted a positive result. Of the fourteen trials that described true blinding of the outcome assessors, eight noted a positive result. The risk of reporting a positive outcome when outcome assessors were unblinded was 1.21 (95% confidence interval, 0.70 to 2.1). The difference was not significant (p = 0.41).

Back to Top | Article Outline

Discussion

Key Findings

We found that (1) only 3.4% of all studies published in The Journal in 2003 and 2004 were randomized controlled trials, (2) previously validated outcome measures were commonly modified and not revalidated, (3) outcome assessors were likely to be unblinded (56%) in orthopaedic randomized trials, (4) blinding of outcome assessors was possible in many situations but was not conducted, and (5) studies in which assessors were not blinded were associated with significantly larger estimates of reported treatment effect (ratio of odds ratios, 0.31; 95% confidence interval, 0.20 to 0.47).

Back to Top | Article Outline

Strengths and Limitations

Our study is strengthened by a comprehensive search in duplicate to identify all published randomized controlled trials in The Journal. Our findings may not be generalized to other journals or other study designs. The number of studies in our review was limited to those published in the eligible time-period. Although the number of studies was small, the association between blinding status and treatment effect was sufficiently large to identify a significant difference. The study was, however, underpowered to detect the association between positive study results and blinding. Our power analysis suggests that 216 studies (108 per arm) would be required to achieve 80% study power (a beta value of 0.20) with an alpha value of 0.05.

Many factors may explain the heterogeneity in the size of treatment effects in addition to blinding5. In our study, allocation concealment had less influence on the magnitude of treatment effect than did blinding of the outcome assessors. However, our sample of studies was not large enough to reach significance. Thus, our results should be cautiously interpreted. Our findings do, however, raise important hypotheses to be examined in future studies.

Back to Top | Article Outline

Review of the Relevant Literature

Evaluation of Outcome Reporting

We chose not to use the terms subjective and objective outcome measures as Zarins did in his recent editorial17. In orthopaedics, so-called hard outcomes (clinical events, such as mortality, and therapeutic outcomes that are assessed from data in the medical record) are seldom the key outcome of interest13. Outcome measures, traditionally described as objective or hard, can even be subject to interrater disagreement, for example, in imaging of the scaphoid bone and physical examination of the range of motion14,26; therefore, we rely on patient-reported, subjective outcome measures13,14. Well-designed patient-reported questionnaires have undergone rigorous testing and may be more objective14. Thus, outcome objectivity is not determined by whether a clinician measures a parameter directly; rather, it is dependent on the reliability or reproducibility of a finding, among patients and clinicians alike27.

Nevertheless, these so-called subjective outcomes, currently in vogue17, present great opportunities for bias11. For example, Schulz and Grimes explained: “If outcome assessors who know of the treatment allocation believe a new intervention is better than an old one, they could register more generous responses to that intervention.”11 In the article by Boutron et al., 77% of the 110 studies they evaluated used a patient-reported outcome and 15% of the studies used outcomes that did not suppose contact with patients (radiology)13. The majority of the randomized controlled trials that met the eligibility criteria for our study (63%) used radiographs as an outcome measure, whereas patient-reported outcomes were less frequently used (a visual analogue scale was used by 38%, and the SF-36 was used by 13%).

The study by Harvie et al., on the use of outcome scores in surgery of the shoulder, revealed that the overall pattern of the application of an outcome score was highly variable and at times inappropriate19. In their study, only nineteen randomized controlled trials were identified. Changes were made to the outcome scores, often without proper testing of the modification and without justification19. These results are comparable with our findings in the thirty-two randomized controlled trials that we reviewed.

Back to Top | Article Outline

Quality of Reporting on the Methodological Safeguard of Blinding

In nonpharmaceutical trials of hip and knee osteoarthritis, blinding was considered feasible less often than in pharmaceutical trials13. Our study investigated the feasibility and reporting of blinding in a variety of orthopaedic diseases, whereas Boutron et al. studied only hip and knee osteoarthritis13. As a result, our study covers a wider spectrum of surgical procedures. If different surgical approaches are used, as in minimally invasive hip replacement studies, identical wound dressings can be used to blind patients in the early recovery period, the period in which differences in treatment effect are most likely to occur9. If patients cannot be blinded, outcome assessors usually can be blinded. If outcome assessors are blinded, it is less likely they will bias their outcome assessments, especially if patient-reported (so-called soft or subjective) outcome measures, such as pain, are used9,11.

Boutron et al. stated that outcome assessment is blinded in cases when the patient is the outcome assessor and the patient is blinded, for example, when self-reported outcome instruments are used13. These patient-based questionnaires are often filled out in clinics with an outcome assessor present in or near the room. Patients frequently ask research nurses or trial coordinators (e.g., outcome assessors) questions about the questions in the outcome instrument. In this situation, an unblinded outcome assessor could direct a patient in their answers to questions about the questionnaire. Therefore, outcome assessor blinding in orthopaedic research is of paramount importance. The 33% difference in treatment effect shown in our study further suggests that blinding is important. A future study should focus in more detail on the influence of outcome assessors on patients filling out self-reported questionnaires. In the present study, the majority of the randomized controlled trials published in The Journal in 2003 and 2004 failed to note whether outcome assessors were blinded.

A previous study on the quality of reporting of randomized controlled trials in The Journal categorized the blinding of treatment providers, patients, outcome assessors, and data analysts into three groups according to whether there was (1) a clear statement of blinding, (2) a clear statement of no blinding, and (3) no statement on blinding28. That study did not describe whether blinding would have been feasible, but it noted that the majority of studies (55%) did not adequately report blinding28. Interestingly, that report found a similar proportion of randomized controlled trials (3%) in the total number of publications.

Reporting guidelines in randomized controlled trials continue to evolve29. Studies may not describe blinding of outcome assessors when, in fact, blinding was done30. The CONSORT statement provides guidelines for better reporting of randomized controlled trials29. However, the application of the CONSORT statement in many randomized controlled trials remains low29,31.

Back to Top | Article Outline

The Effect of Outcome Blinding on the Reported Treatment Effects

Blinding of outcome assessors is one of the methodological safeguards to ensure the internal validity of a trial7,11,24. Treatment effects are known to be overestimated in unblinded, non-orthopaedic (pharmaceutical or medical) studies5,7,25. The study by Schulz et al.7 showed a ratio of odds ratios of 0.83 (95% confidence interval, 0.71 to 0.96), Kjaergard et al.25 reported a ratio of odds ratios of 0.56 (95% confidence interval, 0.33 to 0.98), and Juni et al.5 reported a ratio of odds ratios of 0.88 (95% confidence interval, 0.75 to 1.04). One report described surgical trials and found a ratio of odds ratios of 0.87 (95% confidence interval, 0.56 to 1.36)23. Conversely, another report showed an underestimation of treatment effect (ratio of odds ratios, 1.11; 95% confidence interval, 0.76 to 1.63)24. In our study, we found strong evidence that unblinding the outcome assessors exaggerates the treatment effect (ratio of odds ratios, 0.31; 95% confidence interval, 0.20 to 0.47) (Fig. 3). Previously, it had been suggested that the use of unblinded outcome assessors who score so-called soft outcomes (patient-reported outcomes such as pain and disability)13, as is often the case in orthopaedic trials, may result in biased findings11. Our study is the first, as far as we know, to test this theory for outcome assessment in randomized controlled trials in orthopaedics.

In conclusion, if readers want to apply the findings of a randomized controlled trial to their daily clinical work, they should rely on the internal and external validity of the randomized controlled trial. Our study showed that reports of randomized controlled trials had serious threats to internal validity. Investigators should carefully report outcome measures, use validated measures whenever possible, and attempt to blind the outcome assessment whenever possible.

Back to Top | Article Outline

Appendix

Lists of the thirty-two randomized controlled trials and the specific outcomes studied are available with the electronic versions of this article, on our web site at jbjs.org (go to the article citation and click on “Supplementary Material”) and on our quarterly CD-ROM (call our subscription department, at 781-449-9780, to order the CD-ROM). ▪

Disclosure: In support of their research for or preparation of this work, one or more of the authors received, in any one year, outside funding or grants in excess of $10,000 from a Canada Research Chair from the Canadian Institutes of Health Research (M.B.) and from a Stichting Wetenschappelijk Onderzoek Orthopaedische Chirurgie Fellowship, Biomet The Netherlands, Anna Fonds, Zimmer The Netherlands, Stryker The Netherlands, MSD The Netherlands, and a Nederlandse Vereniging voor Orthopedische Traumatologie Fellowship (R.W.P.). Neither they nor a member of their immediate families received payments or other benefits or a commitment or agreement to provide such benefits from a commercial entity. No commercial entity paid or directed, or agreed to pay or direct, any benefits to any research fund, foundation, division, center, clinical practice, or other charitable or nonprofit organization with which the authors, or a member of their immediate families, are affiliated or associated.

Investigation performed at the Orthopaedic Research Unit, Division of Orthopaedic Surgery, McMaster University, Hamilton Health Sciences-General Hospital, Hamilton, Ontario, Canada, and the Department of Orthopaedic Surgery, OrthoTrauma Research Center Amsterdam, Academic Medical Center, University of Amsterdam, Amsterdam, The Netherlands

Back to Top | Article Outline

References

1. Wright JG, Swiontkowski MF, Heckman JD. Introducing levels of evidence to the journal. J Bone Joint Surg Am. 2003;85: 1-3.
2. Day SJ, Altman DG. Statistics notes: blinding in clinical trials and other studies. BMJ. 2000;321: 504.
3. Fergusson D, Glass KC, Waring D, Shapiro S. Turning a blind eye: the success of blinding reported in a random sample of randomised, placebo controlled trials. BMJ. 2004;328: 432.
4. Juni P, Witschi A, Bloch R, Egger M. The hazards of scoring the quality of clinical trials for meta-analysis. JAMA. 1999;282: 1054-60.
    5. Juni P, Altman DG, Egger M. Systematic reviews in health care: assessing the quality of controlled clinical trials. BMJ. 2001;323: 42-6.
    6. Moher D, Cook DJ, Jadad AR, Tugwell P, Moher M, Jones A, Pham B, Klassen TP. Assessing the quality of reports of randomised trials: implications for the conduct of meta-analyses. Health Technol Assess. 1999;3: i-iv, 1-98.
      7. Schulz KF, Chalmers I, Hayes RJ, Altman DG. Empirical evidence of bias. Dimensions of methodological quality associated with estimates of treatment effects in controlled trials. JAMA. 1995;273: 408-12.
      8. Schulz KF. Assessing allocation concealment and blinding in randomised controlled trials: why bother? Evid Based Med. 2000;5: 36-8.
      9. Lilford R, Braunholtz D, Harris J, Gill T. Trials in surgery. Br J Surg. 2004;91: 6-16.
      10. Devereaux PJ, Bhandari M, Montori VM, Manns BJ, Ghali WA, Guyatt GH. Double blind, you are the weakest link—good-bye! ACP J Club. 2002;136: A11.
      11. Schulz KF, Grimes DA. Blinding in randomised trials: hiding who got what. Lancet. 2002;359: 696-700.
      12. Schulz KF, Chalmers I, Altman DG. The landscape and lexicon of blinding in randomized trials. Ann Intern Med. 2002;136: 254-9.
      13. Boutron I, Tubach F, Giraudeau B, Ravaud P. Blinding was judged more difficult to achieve and maintain in nonpharmacologic than pharmacologic trials. J Clin Epidemiol. 2004;57: 543-50.
      14. Pynsent PB. Choosing an outcome measure. J Bone Joint Surg Br. 2001;83: 792-4.
      15. Bombardier C. Outcome assessments in the evaluation of treatment of spinal disorders: summary and general recommendations. Spine. 2000;25: 3100-3.
      16. Schulz KF, Grimes DA. Allocation concealment in randomised trials: defending against deciphering. Lancet. 2002;359: 614-8.
      17. Zarins B. Are validated questionnaires valid? J Bone Joint Surg Am. 2005;87: 1671-2.
      18. Swiontkowski MF, Buckwalter JA, Keller RB, Haralson R. The outcomes movement in orthopaedic surgery: where we are and where we should go. J Bone Joint Surg Am. 1999;81: 732-40.
      19. Harvie P, Pollard TC, Chennagiri RJ, Carr AJ. The use of outcome scores in surgery of the shoulder. J Bone Joint Surg Br. 2005;87: 151-4.
      20. Pynsent PB, Fairbank JC, Carr AJ. Outcome measures in orthopaedics and orthopaedic trauma. 2nd ed. New York: Oxford University Press; 2004.
      21. Moher D, Schulz KF, Altman DG; CONSORT Group. The CONSORT statement: revised recommendations for improving the quality of reports of parallel-group randomised trials. Lancet. 2001;357: 1191-4.
      22. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33: 159-74.
      23. Balk EM, Bonis PA, Moskowitz H, Schmid CH, Ioannidis JP, Wang C, Lau J. Correlation of quality measures with estimates of treatment effect in meta-analyses of randomized controlled trials. JAMA. 2002;287: 2973-82.
      24. Moher D, Pham B, Jones A, Cook DJ, Jadad AR, Moher M, Tugwell P, Klassen TP. Does quality of reports of randomised trials affect estimates of intervention efficacy reported in meta-analyses? Lancet. 1998;352: 609-13.
      25. Kjaergard LL, Villumsen J, Gluud C. Reported methodologic quality and discrepancies between large and small randomized trials in meta-analysis. Ann Intern Med. 2001;135: 982-9.
      26. Poolman RW, Hanel DP, Mann FA, Ponsen KJ, Marti RK, Roolker L. Trans-Atlantic hospital agreement in reading first day radiographs of clinically suspected scaphoid fractures. Arch Orthop Trauma Surg. 2002;122: 373-8.
      27. Suk M, Hanson BP, Norvell DC, Helfet DL. The AO handbook of musculoskeletal outcome measures and instruments. New York: Thieme; 2005.
      28. Bhandari M, Richards RR, Sprague S, Schemitsch EH. The quality of reporting of randomized trials in the Journal of Bone and Joint Surgery from 1988 through 2000. J Bone Joint Surg Am. 2002;84: 388-96.
      29. Mills E, Wu P, Gagnier J, Heels-Ansdell D, Montori VM. An analysis of general medical and specialist journals that endorse CONSORT found that reporting was not enforced consistently. J Clin Epidemiol. 2005;58: 662-7.
      30. Devereaux PJ, Choi PT, El Dika S, Bhandari M, Montori VM, Schunemann HJ, Garg AX, Busse JW, Heels-Ansdell D, Ghali WA, Manns BJ, Guyatt GH. An observational study found that authors of randomized controlled trials frequently use concealment of randomization and blinding, despite the failure to report these methods. J Clin Epidemiol. 2004;57: 1232-6.
      31. Mills EJ, Wu P, Gagnier J, Devereaux PJ. The quality of randomized trial reporting in leading medical journals since the revised CONSORT statement. Contemp Clin Trials. 2005;26: 480-7.
      Copyright © 2007 by The Journal of Bone and Joint Surgery, Incorporated