The desire to perform evidence-based anesthesia is currently growing throughout the world. The practice of evidence-based medicine is founded on sound and well-conducted science. It requires clinical interventions to be investigated for clinically relevant outcomes. The retrieval of clinically relevant articles is, however, not always easy. The difficulty lies in knowing where one should search for relevant answers to clinical questions.
Logically, one assumes that clinically relevant articles of a high quality will be found in scientific journals, especially those that are esteemed or most often cited. We chose to take a close look at five high impact anesthesia journals, to investigate the amount of clinical scientific content, and to review the methodological aspects of each article. We hoped to discern whether anesthesia journals actually supplied the answers to our clinical questions or were merely a medium for communication among scientists.
The purpose of this review was to review all articles in Anesthesiology, Anesthesia & Analgesia, British Journal of Anaesthaesia, Anesthesia and Acta Anaesthesiologica Scandinavica and classify each article according to type and size compared to the total number of pages in the journal.
We also looked at the validity of the Journal Impact Factor as a measure of the quality of journals by applying our own quality criteria to the articles published in each journal.
We examined all articles in five high impact anesthesia journals (Anesthesiology, Anesthesia & Analgesia, British Journal of Anesthesia, Anesthesia, and Acta Anaesthesiologica Scandinavica) published between January and June 2000. Articles were evaluated and classified according to type, outcome, and design. All relevant data from the articles were imported to Procite 5.0 (ISI ResearchSoft 1999; ISI, Philadelphia, PA) from the PubMed online database (http://www4.ncbi.nlm.nih.gov/PubMed). Articles that were missing in PubMed were manually added (e.g., books and multimedia reviews). Certain types of pages were not included in the database; e.g., abstracts from congresses and advertisements.
Evaluation and Classification of Articles
To ensure accurate classification of the articles, stringent criteria were defined before the first read-through. All articles were read twice and, when in doubt, a second opinion was sought by another reviewer. Articles were described as either primary or secondary studies.
Primary studies were defined as studies that reported research first hand; i.e., directly on individuals or objects. The research object was therefore people, animals, or in vitro research (1).
Secondary studies were defined as studies that attempted to summarize and draw conclusions from primary studies (1). Secondary studies could vary in methodology and statistical validity and were classified into two subtypes: systematic reviews and narrative reviews. A meta-analysis was classified as a statistical pooling of data from homogeneous, but different trials to increase sample size and statistical validity. A meta-analysis could be an independent paper or part of a systematic review of varying quality (2,3).
A “randomized clinical trial” was defined as a clinical study on patients in which the participants were definitively assigned prospectively to one of two or more alternative forms of health care, using a process of random allocation achieved by random number tables, opaque envelopes, computer generated numbers, or stratified randomization (4,5). A clinical trial implies that the participants were patients and not volunteers, in artificial and controlled surroundings (1). A “controlled clinical trial” was a clinical study in which the outcome was compared against a control group but with unclear, quasi-random, or nonrandom allocation. It is also known as a “nonrandomized controlled trial” (5).
“Research on animals and laboratory investigations” were defined as research done on healthy volunteers, nonhuman objects, or animals. Research on animals and laboratory research were defined as separate entities but for data-analysis purposes they were merged into one.
“Other design” was primary research not classifiable in any of the above-mentioned types; i.e., without a control group and using humans as primary research objects.
Secondary Studies and Other Research
“Systematic reviews” were defined as secondary research in which scientific strategies were used to reduce bias in the systematic gathering, the critical evaluation, and synthesis of all relevant studies on a specific subject. A systematic review should be conducted according to explicit objectives and reproducible methodology (1); as a scientific article with background, methods, results, and discussion parts.
“Narrative review” was defined as a review in which the methods for data collection, synthesis, and interpretation were not usually reported (2). Examples are “Clinical Concepts and Commentary,” and “Review Articles.”
“Editorials” and “correspondence” are self- explanatory.
“Other articles” refers to articles not classifiable in the above categories. By nature they were a heterogeneous group. The articles were generally unstructured. Examples are case studies, book reviews, multimedia, hazard notices, and biographies.
Outcomes were divided as follows.
* Clinical relevant outcome: defined as an outcome with a clinical effect on the patient that directly affects how a patient feels, functions, or survives (6).
* Surrogate outcomes: defined as a surrogate for a clinically relevant outcome, such as a laboratory value or physical sign. Changes induced by a therapy on a surrogate end-point are expected to reflect changes in a clinically meaningful end-point.
* Ease of Practice: defined as an outcome with impact on the health care professional and no impact on the patient.
* Apparatus: defined as research performed with the purpose of deciding the quality and function of medical apparatus.
All data were imported into SPSS 10.0.5 (© 1989–99 SPSS Inc., Chicago, IL) and Microsoft Excel 2000 (© 1983–99 Microsoft Corporation, Redmond, WA). The size of each article was manually added. We compared the number of pages each type of article comprised in the journals with Mann-Whitney U-test (nonparametric).When comparing nondependent data the χ2 test was used (Statistix © 1992 Analytical Software; Tallahassee, FL).
Clinical relevance was calculated as the proportion of articles using a primary or secondary clinical methodology (randomized clinical trial, controlled clinical trial, “other type,” systematic review, or meta-analysis) and a clinical end-point. This number is thus a measure of the priority the clinically relevant articles have in the journals.
In total we classified 1379 articles out of 5468 pages. One-thousand-three-hundred-thirteen articles were imported from the online PubMed database (http://www.ncbi.nlm.nih.gov/PubMed) and 66 were added by hand.
Types of Articles
Table 1 shows the distribution of pages per type of article in the five journals. The most frequent types of articles found were “laboratory research and animal studies” (31.2%), randomized clinical trial (20.4%), “other design” (17.7%), and “other type” (10.6%).
There was a large variation in the proportion of randomized clinical trials in the journals (spread, 12.2% to 35.3%; median, 20.4%).
In the randomized clinical trials, systematic reviews, and meta-analyses, clinically relevant outcomes were measured in 54% of the articles and surrogate outcomes in 33% (Table 2).
Table 3 shows that the overall proportion of clinically relevant pages in the journals was 18.6% (spread, 13.2% to 22.8%).
Figure 1 shows the relation between the Journal Impact Factor and the calculated measure of how clinically relevant the journals were. The Journal Impact Factor varied inversely with the measure. (−3.572, r2 = 0.546). Noticeably, the regression curve was based on a mere five sets of data.
It is important for the clinical practitioner to be able to find clinically relevant information. In view of society’s increasing demand for evidence-based medicine in health care, one would expect the proportion of clinically relevant literature in the major journals to be large.
However, only 18.6% of the pages in 5 major anesthesia journals were clinically relevant as defined by our criteria. The remaining pages were judged nonclinical either because they used suboptimal methodology or because they investigated surrogate or nonclinical outcomes.
Why are the Clinical Articles few in Numbers?
It may be reasoned that clinical research is more complex and time consuming than laboratory investigations. It is not surprising, therefore, that more than 80% of the pages in the 5 journals were devoted to articles that cannot be applied to a clinical situation, their flaws being methodological and problems with their chosen end-points.
Many scientists recognize animal studies and laboratory investigations as the cornerstone of research, albeit they cannot be directly applied as clinically educational tools. This type of research is the basis for future types of treatments. Performing research on humans is limited because of ethical and health-related issues.
Our results show that narrative review articles and editorials fill more pages than systematic reviews and meta-analyses. This is unfortunate, for narrative review articles and editorials often have a greater impact on the reader than other types of articles (7), despite their poorer quality and validity. The problem with narrative review articles is that the references and sources are often chosen haphazardly, with no apparent consistent structure, resulting in high risk of bias (8). Antman et al. (9) have compared the recommendations of clinical experts with the results of meta-analyses of randomized clinical trials and found discrepancies. The recommendations of the clinical experts often failed to mention important advances or exhibited delays in recommending effective preventive measures.
Surrogate Outcome Measures
There are numerous instances where the use of surrogate outcome measures has caused wrong conclusions (10–12). It is remarkable that more than one third of the outcomes we studied in the randomized clinical trial articles, systematic reviews, and meta-analyses were surrogate. Several articles stated that surrogate outcomes did not measure the outcome of real interest and imposed a risk of serious fallacies (13,14).
There are many apparent advantages to using surrogate outcome measures. They can be faster, less expensive, and more efficient than clinical outcomes and can be used in situations where clinical outcomes would be too invasive or unethical. Clinicians should be aware, however, that it is important to not solely base a treatment on a surrogate outcome. Unless a clinician is confident that a surrogate outcome is valid, it is better to wait for truly clinically relevant outcomes to be researched. Only rarely are surrogate outcomes explicitly linked to the clinical outcome of interest (15); they may be either an inaccurate indicator for a given effect or directly misleading. Unless a definite correlation can be scientifically established, such as the relationship between the human immunodeficiency virus (HIV) in a blood test and the later breakthrough of acquired immune deficiency syndrome (AIDS) (HIV being the surrogate for AIDS), we strongly recommend against the use of surrogate outcomes for making clinical decisions.
The “Good” Articles
Approximately 50% of the articles in the journals studied were randomized clinical trials, systematic reviews, or meta-analyses. Statistically they constituted the more valid articles. No other study design provides the safeguards against bias associated with randomization (16). Several teams of authors have examined the quality of primary studies, including randomized clinical trials, and have called attention to the need to increase the quality as well as produce more systematic reviews and meta-analyses (2,3).
Whether systematic reviews offer more statistical safety than randomized clinical trials is still debated (17). It depends very much on the size of the trial or trials and how rigorously the methodological rules have been followed.
Only two to four percent of the pages in the five journals studied were used on secondary systematic examinations. The most obvious reason was that the writing of these articles is highly time consuming. Nevertheless there are forums that focus on such types of articles (e.g., the Cochrane Collaboration, http://www.cochrane.org).
Correlation Between Journal Impact Factor and Clinical Relevance
We found that the journal with the highest Journal Impact Factor had the smallest fraction of clinical relevant articles. However, when choosing journal subscriptions, the typical Scandinavian hospital department bases the decision on the Journal Impact Factor. In mainly older articles it has been postulated that Journal Impact Factor is a relevant indicator for the quality of research (18). That point of view is questionable, and the reason for using Journal Impact Factor as criteria of choice is that it has been the only available measure of the quality of journals (19).
The Journal Impact Factor is based on the total number of citations made in 1 year for articles published in the previous 2 years divided by the number of citable articles published in the same 2 years (20). The Journal Impact Factor is encumbered with several shortcomings. The use of an extremely short-term index introduces a strong temporal bias, meaning that articles in journals with short publication lags (i.e., the delay between the final acceptance of a manuscript and its actual appearance in print) contains many up-to-date citations and thus contributes heavily to the Impact Factors of all cited journals. One other problem is the uneven contribution of the various articles to the Journal Impact Factor. Seglen (21) showed that 50% of the most frequently acknowledged articles were cited on average 10 times as often as the least quoted articles. The use of Journal Impact Factor as a measure of the clinical usability of a journal is thus questionable.
Strengths and Weaknesses in the Survey
We have found disagreements on definitions on randomization and randomized clinical trials (5,22). We believe that the definitions we chose were the most concise and reproducible. Despite the stringent classification criteria, there was a risk that others would classify articles differently. A discussion of these definitions in scientific fora is needed for a higher degree of clarification and precision.
This study may share problems often found in peer reviewing. For example, it impossible to be absolutely objective when classifying articles (23). Another type of bias was empathy with certain types of authors (female/male, estimated/less estimated) and bias in relation to certain types of articles (7).
It should be mentioned that we did not include the five most cited anesthesia journals in this study. For example Acta Anaesthesiologica Scandinavica was cited less frequently than the Canadian Journal of Anesthesia in the year 2000 but was chosen because of its popularity in Scandinavia among clinicians.
Is it Possible to Increase the Amount of Clinically Relevant Articles?
It is of course impossible to predict whether the share of clinically relevant literature in journals will increase. No comparison of different years has been made. The large differences between the journals in this study do indicate that a change is possible.
It is desirable that the amount of surrogate outcomes in journal articles be reduced and that the number of randomized clinical trials and systematic reviews be increased. Meta-analyses are highly time consuming and require specialized knowledge; therefore it is doubtful that their numbers will increase. The number of clinically relevant journal articles can only be increased to a certain extent, of course. Basic science is a prerequisite for clinical research. Many very important scientific findings are results of explorations of unknown areas.
In recent years, more attention has been drawn to the problems in evidence-based medicine. Some advances have been made. BMJ Books has, for example, introduced “Clinical Evidence” (http://www.clinicalevidence.org). The Cochrane Collaboration prepares, maintains, and ensures the accessibility of systematic reviews of the effects of health care interventions (http://www.cochrane.org). Both initiatives offer the physician an opportunity to choose interventions with the best possible evidence. More research on this subject is needed. We looked at only six issues of five journals, and we may not have presented a representative picture of the journals. It is impossible in this study to show whether the composition of journal articles has changed over time. A more thorough analysis with more journals should be made. For example, a comparison of representative samples obtained over several years of publications to increase the statistical validity and possibly recognize changes over time. More methodological aspects from the articles could be investigated to assess the quality of the manuscripts. It could be interesting to include other branches of medicine. It is doubtful that the amount of clinically relevant articles is especially small in anesthesia publications, but new research is crucial to answer that question.
Discussion and, perhaps, redefinition of terminology used in the field of scientific procedures will be helpful for the future discussion of quality in publishing scientific results.
We would like to thank Mrs. Lisa Bismuth for her help and critical appraisal of particular articles and Mrs. Jane Cracknell for constructive input and language editing.