Just as most people will know what a duck is but few will be able to define it, so similar considerations apply to scientific or clinical quality. It is inevitably difficult to quantify qualitative measures such as scientific excellence, but once a measure has been developed and has acquired common usage, it will take on a life of its own. Once you have a figure that reflects, say, scientific excellence, you can use it for all sorts of purposes, such as the production of rankings of individuals, institutions or countries, or the evaluation of the cost-effectiveness of research grants. You can, of course, also abuse the figures to draw inappropriate conclusions. It is easy to apply statistics to figures, but conclusions from these statistics can be misleading or even useless. Whichever option is chosen, the impact factor has become the common currency of ‘scientific quality'. Understanding the strengths as well as the limitations of the measure is important to all those who publish articles as well as those who use the factor as a surrogate marker of quality.
What is the impact factor?
The journal impact factor is based on information obtained from citation indexes. The most commonly used index is the Science Citation Index, which was devised by the Institute for Scientific Information (ISI) and has been published by them since 1963. All original articles, technical notes and reviews (but not letters or abstracts) published in a set of core journals are scanned, and all publications cited as references are recorded. It is assumed that one measure of the importance of an article is reflected by the number of times that the article is quoted in a given period of time. Although it is possible to determine the impact factor of any individual article, other measures are usually employed because of logistic problems. The most commonly used measure is the journal impact factor, which assesses the citation rates of articles within a journal rather than of any given article. The impact can be defined over a period of time, but the most commonly used period is the 2-year impact factor. Thus, for a journal with an impact factor of 10 in, say, 1999, articles published in that journal in 1997 and 1998 were cited, on average, 10 times.
Journal impact factors
These are published annually on a core of selected biomedical journals. They are available commercially, although most university libraries will be able to supply them. A broad analysis of the journal impact factors shows some consistent traits:
- Scientific journals rank higher than clinical journals.
- English-language journals score higher than those in other languages.
- American journals tend to have higher impact factors than European journals.
- Review journals tend to score higher than those containing original articles.
- Review articles tend to score higher than the articles they cite.
- The most prestigious journals in different specialist areas may have very different impact factors.
- Methodological papers may score much higher than those that provide new data.
- Free electronic access tends to raise the impact factor of a journal.
Limitations of impact factors as markers of ‘scientific excellence'
As both proponents and opponents of the use of impact factors appreciate, there are several limitations regarding the use of impact factors. These concerns need to be remembered in any analysis of impact factors:
- Journal impact factors reflect the journal rather than the article. Assuming that all articles in the journal are of a similar quality is, of course, wrong. Indeed, the distribution of impact factors of articles within a journal is skewed quite heavily . Thus, a case report, which may be no more than a clinically important observation, will be associated with the same impact factor as a major therapeutic study; a review, which may contain no new data, will have the same impact factor as a piece of work. The authorship of a review may reflect the experience, status or authority of the writer (does this editorial refute this assumption?), but it cannot readily be compared with the novel scientific discovery.
- Journal impact factors will vary with time in both absolute numbers and rankings. It could be argued that within a discipline, rankings of journals are a better reflection of quality than absolute number.
- Changes in clinical interest will affect impact factors. Thus, it is likely that the association between Helicobacter pylori and duodenal ulcers, and the impact on therapy, will have raised the impact factor of gastroenterological journals.
- Impact factors say nothing about the stringency of the peer review process.
- Editors, who will be judged in part on the change in journal impact factor, may take into account the future citation rates of a manuscript in deciding whether to offer publication.
- Impact factors may be manipulated by both authors and editors. I recently received a letter from one editor suggesting that, where possible, references to that journal should be included in manuscripts submitted to the journal. It has been shown that a high rate of self-citing can affect the impact factor . There are other ways in which impact factors can be improved. Thus, mini-reviews can attract citations rapidly and in large numbers and, without counting these as a denominator, research letters and abstracts can act as numerators. When meeting abstracts published in the FASEB Journal were classified as non-source items in 1988, the impact factor of the journal leapt from 0.24 in 1988 to 18.3 in 1989 . Indeed, a letter to the Lancet about non-O1 cholera is a citation classic.
- The increasing use of the very-high-quality biomedical library search engines (such as PubMed or Ovid) means that researchers have access to more and more information. This does not necessarily mean that the most appropriate references (such as the first clinical observation or demonstration of a new scientific concept) will be cited. Greater availability of the journal can increase the impact factor substantially [4,5].
- A paper that is retracted later may continue to be cited  (and not necessarily with the retraction), so adding little to the scientific validity of the journal.
- Citation practices are inconsistent. Scientific articles tend to cite only scientific articles, whereas clinical articles cite both scientific and clinical articles, thus increasing the impact factor of scientific journals compared with clinical journals.
- The 2-year period for the impact factor is arbitrary and not based on any robust, published data, as far as the authors are aware.
On the whole, the range of impact factors does reflect approximately the scientific standing of a journal. Thus, Nature (current impact factor 25.8) is perceived to be a scientifically ‘better’ journal than, say, Gastroenterology (impact factor 12.2). The use of inverted commas for concepts such as scientific standing and scientific quality is to emphasize that although most people will understand these concepts, they are difficult to define and even more difficult to quantitate. Can Nature Medicine be considered better than Nature (impact factors 27.9 and 25.8, respectively)? However, impact factors do not reflect fully the perceived quality of the journal (at least as perceived by the authors of this editorial). Thus, Foster reports the findings of when 50 scientists from the National Institutes of Health were asked to rank the most prestigious scientific and clinical journals : the scientific journals were in the order Science, Nature, Cell, Proceedings of the National Academy of Sciences, Journal of Biological Chemistry, Journal of Clinical Biology, and Biochemistry; the main clinical journals were ranked in the order New England Journal of Medicine, Journal of Clinical Investigation, the Lancet, Journal of the American Medical Association, Annals of Internal Medicine, and the British Medical Journal. At that time, the rankings by impact factor were, for the scientific journals, 10, 8, 3, 37, 67, 35 and 111, respectively, and for the clinical journals, 5, 49, 16, 97, 38 and 168, respectively. There are, of course, several conflicting conclusions that could be drawn: who is to say which group reflects standing better?
Furthermore, while comparisons between journals can be made, over-interpretation leads to potentially inappropriate conclusions. For example, is the New England Journal of Medicine (impact factor 29.5) a better journal than Nature (25.8)? Clearly, a crude comparison is invalid since the two journals are not competing for the same articles. In the fields of gastroenterology and hepatology, is Gastroenterology twice as good as Gut (impact factors 12.2 and 5.4, respectively), and does this mean that one article, say a case report in Gastroenterology, is worth two multicentre, randomized studies in the Journal of Hepatology (impact factor 3.8)? Comparisons between different specialty journals may lead to even more bizarre conclusions. The impact factor for Gastroenterology (the highest-impact-factor journal in the fields of gastroenterology and hepatology at 12.2) compares with the highest impact factors in gerontology and ageing (Neurobiology Ageing, impact factor 4.2) or nursing (Nursing Research, impact factor 1.1). Of the 43 nursing journals in the analysis, only three have an impact factor greater than 1; compare this with gastroenterological journals, where 30 of the 45 journals have impact factors greater than 1. Does this mean that gerontology research is less good than gastroenterology research, and that nursing research is hopeless? It may be that this is the case, but comparisons of impact factors will neither prove nor disprove the point. High-quality nursing articles may be published in the general literature (such as the New England Journal of Medicine) and not in the specialty journals, or maybe articles in nursing journals do not use many citations.
On the whole, those articles that are funded by external funders using peer review are in higher-impact-factor journals than those for which there is no external funding. Does this mean that the very cumbersome and expensive processes adopted (rightly) by these charitable or state bodies are justified, or merely that these bodies work to a self-fulfilling premise? Of course, the fact that impact factors are used to assess quality of output will distort the pattern of submissions, and authors may chose to submit an article to a journal with a high impact factor rather than one with a readership more appropriate for the article.
However, these limitations should not detract from their value. These measures are, at least, objective and are capable of only modest manipulation, and in most instances they follow intuition and experience. One of the reasons why impact factors have stayed in use is, in part, because they are the best currently available measure. Another reason is that attention to the quality of the journal will further encourage researchers to concentrate their output on high quality rather than high quantity. Authors need to consider the message they wish to pass on to the reader, and then decide the best vehicle. Others have tried to overcome the shortfalls by, for example, assuming that the quality of the specialty journal is similar, and so adjusting for the fact that variations in factors may affect the number of citations of an article (such as, for example, the number of journals and published articles in that field or the current scientific or clinical interest). However, these methodologies are, themselves, subject to limitations.
Impact factors were designed to provide an objective comparison between journals, and, within limitations, they do this well. Like all other measures, the transition from qualitative to quantitative measures can result in inappropriate conclusions being drawn. Provided the users of impact factors understand the strengths and weaknesses of the impact factors and do not over-interpret data from their analysis, then there is no problem. It is when data are misused that mistakes occur.