Kuroki, Lindsay M.1; Allsworth, Jenifer E. PhD2; Peipert, Jeffrey F. MD, PhD2
Recently, there has been a progressive effort by researchers and journal editors to assess and improve the quality of published studies. Evidence-based medicine continues to guide clinical decision making based on the best available evidence in the literature. The U.S. Preventive Services Task Force1 and Obstetrics & Gynecology2 have adopted similar rating systems to characterize the quality of available evidence (Table 1). Under this hierarchical approach, randomized controlled trials (RCTs) remain the gold standard of research because they provide the strongest evidence of causality and the best methodology to test the effectiveness of an intervention.3,4
Examples of such initiatives have been pursued within the field of obstetrics and gynecology.5–9 A 2002 study by Welch et al6 reveals a significant improvement in the quality of statistical analysis in articles published during a 5-year interval by the American Journal of Obstetrics & Gynecology. Similar findings also are reported in Obstetrics & Gynecology by Dauphinee et al,7 who also note a trend toward increased observational studies and decreased anecdotal reports. The number of RCTs, however, did not change during their 10-year study period and has remained consistently behind the more abundantly published observational studies.5,7,10
In response to the evidence-based paradigm, quality assessment reviews of published articles have included evaluations of journal quality factors,11 methodology12 and bibliographic citations,9 logistic regression reporting,13 and the quality of discussion sections.14 Findings from such reports have the potential to improve the quality of published literature. Not only are “periodic report cards”10 cited as future goals, but there is agreement that clinicians, researchers, policy makers, and journal editors all have a fundamental responsibility to meet and contribute to higher standards of research evidence.
Specifically focusing on quality assessment studies within obstetrics and gynecology journals, there still remain gaps within the current literature that warrant addressing. Few study objectives compare obstetrics and gynecology research methodologies with those reported in other medical journals. Our study objective was to classify and compare research methodologies and statistical reporting of published articles in six major medical journals, selected based on journal impact factor and medical specialty (three from moderate–impact-factor obstetrics and gynecology journals and three from high–impact-factor general medical journals). More specifically, we hypothesized that journals with a higher impact factor publish more RCTs and have a higher quality of statistical reporting compared with moderate–impact-factor obstetrics and gynecology journals.
MATERIALS AND METHODS
We performed a cross-sectional analysis of 371 published articles, indexed in the MEDLINE database, to evaluate the research methodologies and statistical methods used for clinical research. All articles were published in six medical journals from January to June 2006. Selection of journals was based on their medical specialty and 2005 journal impact factor. The high–impact-factor group included the New England Journal of Medicine (impact factor 44.0), The Lancet (impact factor 23.4), and the Journal of the American Medical Association (impact factor 23.3). The moderate–impact-factor group consisted of Obstetrics & Gynecology (impact factor 4.2), the American Journal of Obstetrics & Gynecology (impact factor 3.1), and the British Journal of Obstetrics and Gynaecology (impact factor 2.1).
All articles selected contained clinically oriented research based on human participants only. We began by generating a library of all 2006 publications for each journal using PubMed and a reference manager, EndNote 9.0. After a list for all six journals was generated, we performed an initial screening of articles to narrow our sample population to include only those publications that involved original research, systematic reviews, case reports, and case series. We excluded commentaries, correspondences, clinical expert series, discussions, editorials, and letters to the editor. However, to achieve this goal in each of the selected journals, we had to apply different keyword exclusion criteria according to the individual journal style (Fig. 1). Traditional reviews that failed to provide original data or perform a systematic review of the literature were excluded as well. The number of articles that met eligibility criteria varied by journal. From this set, we randomly selected articles using a computer-generated random-number sequence.
Articles were assigned a random number and ranked, and abstractions were carried out until we collected data from at least 50 articles per journal, all of which met the inclusion criteria. A standardized form was used to record the abstracted data. The methodology of each study was categorized according to its level of evidence based on guidelines (Table 1) set forth by Obstetrics & Gynecology.
We also were interested in gathering data regarding number of authors, whether the authors provided a clearly stated research hypothesis, whether attributable risk or number needed to treat or both were reported, and whether or not statistical tests/analyses were performed. Statistical measures from both the abstract and results sections of each article also were captured. Specifically, we noted: 1) use of effect measures (eg, mean differences, odds ratio, relative risk, or other effect measures), confidence intervals (CIs), P values, or a combination of the three; and 2) whether nonsignificant differences were reported using actual P values compared with stating “nonsignificant.” Higher quality (or “recommended”) statistical reporting were reports that included point estimates with measures of precision as defined by current guidelines.15,16 We noted whether study findings were positive or negative. A negative study was defined as any research whose primary analysis, based on the stated hypothesis, was not statistically significant. In those articles in which the hypothesis was missing or not clearly stated, the primary endpoint was derived from the study aim/objective or article title. Owing to their anecdotal nature and lack of statistical reporting, case reports and case series were omitted from our analysis of quality statistical measures and other study characteristics listed in Table 2.
The institution of several steps was made to ensure the quality and accuracy of the data collected from each journal. All initial abstractions were completed by one investigator (L.K.), and a random sample of 10% of the articles was re-abstracted by a second investigator (J.A.). No systematic errors were identified during this review.
Sample size calculations were predetermined to detect a threefold difference in the percentage of RCTs published in journals within the high–impact-factor group compared with the moderate–impact-factor group (moderate group rate 13%). We determined that a minimum of 50 articles per journal would be necessary to achieve a power of 80% (β=0.20, α=0.05).
Univariable analyses evaluated the association between impact-factor status and methodologic study design, quality statistical measures, and other study journal factors (number of authors, presence of a clearly stated hypothesis, negative study, whether sample size/power calculations were provided, and presence of any regression analysis) using χ2 tests. All statistical analyses were completed using SAS 9.1 (SAS Institute, Cary, NC). This study was approved by the Washington University in St. Louis Human Research Protection Office and was classified as exempt.
We reviewed a total of 371 articles published in six major medical journals, including 56 (15%) from the American Journal of Obstetrics & Gynecology, 59 (16%) from the British Journal of Obstetrics and Gynaecology, 66 (18%) from Obstetrics & Gynecology, 58 (16%) from the Journal of the American Medical Association, 69 (19%) from The Lancet, and 63 (17%) from the New England Journal of Medicine. Table 3 summarizes the methodologic quality of the studies. The overall majority of published reports consisted of analytic observational studies (50%) followed by RCTs (24%), case reports (14%), systematic reviews (6%), case series (1%), and other (4%). Further breakdown of observational studies revealed a higher proportion of cohort (25%) and cross-sectional studies (20%) than case–control studies (5%). The proportion of RCTs published among the high–impact-factor group (Journal of the American Medical Association, The Lancet, New England Journal of Medicine) was almost three times that of the moderate–impact-factor group (American Journal of Obstetrics & Gynecology, British Journal of Obstetrics and Gynaecology, Obstetrics & Gynecology) (relative risk 2.9, 95% CI 1.9–4.4). Specifically, within the high–impact-factor group, 35% were RCTs and 35% observational studies compared with 12% and 67%, respectively, in the moderate–impact-factor group.
The following results are from analyses that excluded case reports and case series (see Table 2). The number of authors per article ranged from 1 to 47, with the majority of published articles containing between five and nine authors. Eighty-five percent of published articles in the high–impact-factor group listed at least five authors compared with 64% in the moderate–impact-factor group (P<.001). A clearly stated hypothesis was reported in 21% of all studies reviewed—21% of articles in the moderate–impact-factor group and 22% in the high–impact-factor group (P=.6). The New England Journal of Medicine had the highest reporting rate at 33%; all other journals ranged between 8% and 29%. Overall, negative studies were not common; Obstetrics & Gynecology had the lowest percentage of negative studies (8%). Specifically, 18% and 14% of published articles in the high–impact-factor and moderate–impact-factor groups, respectively, produced primary analyses that were not statistically significant.
In terms of sample-size calculations, 47% of published articles in the high–impact-factor group provided power or sample-size calculations or both compared with 23% in the moderate–impact-factor group (P<.001). Further breakdown of the latter revealed a higher frequency of calculations in Obstetrics & Gynecology than in other obstetrics and gynecology journals, although not statistically significant (32% compared with 19%).
We also were interested in making other comparisons between journal impact factor and statistical measures. Overall, recommended statistical reporting (eg, effect size/P value, effect size/CI, or effect size/P value/CI) in the abstract was more common in the high–impact-factor group (61% compared with 36%, P<.001, Table 2, Fig. 2A). Sixty-three percent of published articles in the New England Journal of Medicine contained recommended statistical measures, followed by The Lancet (62%), the Journal of the American Medical Association (59%), Obstetrics & Gynecology (55%), the British Journal of Obstetrics and Gynaecology (31%), and the American Journal of Obstetrics & Gynecology (25%). In contrast, recommended statistical measures within the results section of the article were present in 74% of all published articles in our review. Within the high–impact-factor group, 84% had recommended statistical reporting compared with 65% of articles within the moderate–impact-factor group (P=.002, Table 2, Fig. 2B). The New England Journal of Medicine had the highest reporting rate at 90%, followed by The Lancet (81%), the Journal of the American Medical Association and Obstetrics & Gynecology (80%), the American Journal of Obstetrics & Gynecology (60%), and the British Journal of Obstetrics and Gynaecology (58%). Interestingly, Obstetrics & Gynecology was more comparable with the higher impact group (80% compared with 84%) than the other obstetrics and gynecology journals (80% compared with 58%).
The high–impact-factor group reported specific P values for nonsignificant values more often than did the moderate–impact-factor group (97% compared with 88%, P=.04). The Lancet had the highest reporting rate at 100%, followed by New England Journal of Medicine, the Journal of the American Medical Association, and Obstetrics & Gynecology all at 95%, the American Journal of Obstetrics & Gynecology (92%) and the British Journal of Obstetrics and Gynaecology (73%). Again, the reporting rate in Obstetrics & Gynecology was more reflective of the high–impact-factor group (95% compared with 97%) than of its obstetrics and gynecology counterparts (95% compared with 84%).
Lastly, we also captured the proportion of studies that performed regression analysis. Overall, articles published in the high–impact-factor group included more regression analyses than those published in the moderate–impact-factor group (60% compared with 48%, P=.03).
In our analysis of articles published from January to June 2006, RCTs represented nearly one fourth of all studies. The proportion of RCTs published in the high–impact-factor group was nearly three times that of the moderate–impact-factor group. Other studies evaluating research methodology have shown that observational studies are the most commonly published in the literature,5,7 and our results support this finding. We also noted that the rate of recommended statistical reporting (eg, usage of effect size and its precision as well as specific P-values for nonsignificant values)17 within the abstract and article was highest in the high–impact-factor group. Other factors that were associated positively with high journal impact factor include number of authors (more than five), sample size/power calculations, and presence of regression analysis.
There are several potential explanations for why the moderate–impact-factor group (obstetrics and gynecology journals) contained fewer RCT publications. First, many issues in obstetrics and gynecology do not lend themselves easily to this study design (eg, surgical interventions, contraceptive studies). Ethical concerns of RCTs also play an important role. Another possible explanation is that the highest quality RCTs in obstetrics and gynecology are submitted to the general medical journals.
The quality of statistical reporting in Obstetrics & Gynecology was more reflective of the high–impact-factor group than the other moderate–impact-factor obstetrics and gynecology journals with regards to two factors: 1) reporting specific P-values for nonsignificant values; and 2) reporting statistical measures with estimated effect sizes and estimates of precision within the results section of the article. Reporting of statistical measures within the abstract was also more comparable with the high–impact-factor group than its obstetrics and gynecology counterparts. We concur with previous findings6,7 that improved statistical reporting may be attributed to better listing and describing of statistical procedures in the instructions for authors, more strict editing criteria by journal editors, reviewers, and statistical consultants, as well as increasing collaborations between researchers and epidemiologists/statisticians to design, analyze, and present data. Recommended statistical reporting may be more reflective of journal editing of original manuscripts than of authors discriminately providing higher quality statistical reports to higher impact-factor journals.
Furthermore, among the moderate–impact-factor group, Obstetrics & Gynecology also had the greatest proportion of studies that reported sample size/power calculations. We also noted that Obstetrics & Gynecology had the lowest number of negative studies, regardless of impact-factor status. A recent study in neonatology by Littner et al18 shows that articles with negative results are more likely than those with positive findings to be published in lower impact-factor journals. Publication bias may explain the low prevalence of negative studies in Obstetrics & Gynecology, but our research was not designed to assess either hypothesis. Interestingly, a recent study on predictors of publication showed that, after adjusting for other study characteristics, having statistically significant results did not improve the chance of a study being published (odds ratio 0.83, 95% CI, 0.34–1.96).19
Our study recognizes the need to continue improving the quality of methodology and statistical reporting in the medical literature. The Consolidated Standards of Reporting Trials (CONSORT) statement17 was developed as an evidence-based approach to improve the quality of reporting of RCTs and has been adopted by many journals, including the six in our study. Since its debut in 1996, and subsequent revised version, the CONSORT checklist appears to have led to some improvements already.20,21 A similar set of guidelines for observational studies (STROBE statement) was created more recently, in 2007, and also has the potential to improve the overall quality of observational research reports, especially given the high prevalence of observational studies.16,21,22 With regard to our study, four of the six quality variables we analyzed were listed in both the CONSORT and STROBE guidelines. They included a clearly stated hypothesis, sample size calculations, a summary of results that included estimated effect size and its precision (eg, 95% CI), and reporting of regression analyses. We believe that a clearly stated research hypothesis is an important quality criteria. It is quite surprising that a clearly stated hypothesis was not cited commonly in any of the journals included in our study, regardless of impact-factor status.
Our study has several limitations. We attempted to quantify reporting of attributable risk and number needed to treat in our article abstractions, but, owing to the low number of studies that reported these measures, we were unable to assess their relationship with journal impact factor. We also recognize that the outcome of a study (positive or negative) and number of authors are not reliable markers of quality of research. We agree with previously reported claims that the high author count instead may reflect other factors such as increased collaborations, reports from networks of investigators, funding for research, complexity of research, and pressure on academic faculty to publish.23,24 Furthermore, our research has limited power to detect more modest differences between groups of journals and individual journals given the relatively small number of articles abstracted per journal. Other important questions that remain unanswered include: 1) Are the discrepancies in the quality of reporting between individual journals functions of the quality of the initial manuscript submitted or the strict editing criteria of individual journals? 2) Do authors have a tendency/preference to submit higher level studies (eg, RCTs) to high–impact-factor journals? and 3) Does publication of RCTs lead to a high–impact-factor status, or does a high impact factor lead to submission of RCTs?
Critical evaluation of published medical literature is an important skill that all health care providers and researchers should acquire. Published articles that clearly describe the research hypothesis, study methodology, power/sample size calculations, and statistical analyses allow readers to better assess whether study results should influence clinical practice guidelines and policies. Therefore, journals should continue to strive for the highest level of methodologic design and the highest quality of statistical reporting.
1. U.S. Preventive Services Task Force. Guide to clinical preventive services: report of the U.S. Preventive Services Task Force. 2nd ed. Baltimore (MD): Williams & Wilkins; 1996.
3. Concato J, Shah N, Horwitz RI. Randomized, controlled trials, observational studies, and the hierarchy of research designs. N Engl J Med 2000;342:1887–92.
4. Peipert JF, Gifford DS, Boardman LA. Research design and methods of quantitative synthesis of medical evidence. Obstet Gynecol 1997;90:473–8.
5. Funai EF, Rosenbush EJ, Lee MJ, Del Priore G. Distribution of study designs in four major US journals of obstetrics and gynecology. Gynecol Obstet Invest 2001;51:8–11.
6. Welch GE 2nd, Gabbe SG. Statistics usage in the American Journal of Obstetrics and Gynecology: has anything changed? Am J Obstet Gynecol 2002;186:584–6.
7. Dauphinee L, Peipert JF, Phipps M, Weitzen S. Research methodology and analytic techniques used in the journal Obstetrics & Gynecology. Obstet Gynecol 2005;106:808–12.
8. Grant A. Reporting controlled trials. Br J Obstet Gynaecol 1989;96:397–400.
9. Roach VJ, Lau TK, Ngan Kee WD. The quality of citations in major international obstetrics and gynecology journals. Am J Obstet Gynecol 1997;177:973–5.
10. Funai EF. Obstetrics & Gynecology in 1996: marking the progress toward evidence-based medicine by classifying studies based on methodology. Obstet Gynecol 1997;90:1020–2.
11. Lee KP, Schotland M, Bacchetti P, Bero LA. Association of journal quality indicators with methodological quality of clinical research articles. JAMA 2002;287:2805–8.
12. Grimes DA, Schulz KF. Methodology citations and the quality of randomized controlled trials in obstetrics and gynecology. Am J Obstet Gynecol 1996;174:1312–5.
13. Mikolajczyk RT, DiSilvestro A, Zhang J. Evaluation of logistic regression reporting in current obstetrics and gynecology literature [published erratum appears in Obstet Gynecol 2008;111:996]. Obstet Gynecol 2008;111:413–9.
14. Clarke M, Alderson P, Chalmers I. Discussion sections in reports of controlled trials published in general medical journals. JAMA 2002;287:2799–801.
15. Moher D, Schulz KF, Altman D, CONSORT Group (Consolidated Standards of Reporting Trials). The CONSORT statement: revised recommendations for improving the quality of reports of parallel-group randomized trials. JAMA 2001;285:1987–91.
16. von Elm E, Altman DG, Egger M, Pocock SJ, Gotzsche PC, Vandenbroucke JP, et al. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies. J Clin Epidemiol 2008;61:344–9.
17. Begg C, Cho M, Eastwood S, Horton R, Moher D, Olkin I, et al. Improving the quality of reporting of randomized controlled trials. The CONSORT statement. JAMA 1996;276:637–9.
18. Littner Y, Mimouni FB, Dollberg S, Mandel D. Negative results and impact factor: a lesson from neonatology. Arch Pediatr Adolesc Med 2005;159:1036–7.
19. Lee KP, Boyd EA, Holroyd-Leduc JM, Bacchetti P, Bero LA. Predictors of publication: characteristics of submitted manuscripts associated with acceptance at major biomedical journals. Med J Aust 2006;184:621–6.
20. Moher D, Jones A, Lepage L, CONSORT Group (Consolidated Standards for Reporting of Trials). Use of the CONSORT statement and quality of reports of randomized trials: a comparative before-and-after evaluation. JAMA 2001;285:1992–5.
21. Plint AC, Moher D, Morrison A, Schulz K, Altman DG, Hill C, et al. Does the CONSORT checklist improve the quality of reports of randomised controlled trials? A systematic review. Med J Aust 2006;185:263–7.
22. von Elm E, Altman DG, Egger M, Pocock SJ, Gotzsche PC, Vandenbroucke JP, et al. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies. Lancet 2007;370:1453–7.
23. Khan KS, Nwosu CR, Khan SF, Dwarakanath LS, Chien PF. A controlled analysis of authorship trends over two decades. Am J Obstet Gynecol 1999;181:503–7.
24. Levsky ME, Rosin A, Coon TP, Enslow WL, Miller MA. A descriptive analysis of authorship within medical journals, 1995-2005. South Med J 2007;100:371–5.