There is increased interest in medical education research across the continuum of undergraduate, graduate, and continuing medical education. Although researchers may speculate about the types of medical education research published, to the best of our knowledge no definitive data are available on the field’s publication patterns—that is, the “what” and “where” of publication. One way to explore this is through bibliometric analysis, which can provide data concerning productivity rates, publication patterns, and publication characteristics.
Bibliometric research provides statistical descriptions of publications and is based, in part, on the premise that the published literature of a field embodies the field’s knowledge.1,2 It employs computerized analytic techniques and uses the individual publication as the unit of analysis, drawing data from sources such as MEDLINE, Google Scholar, and Journal Citation Reports (JCR).3 Articles’ metadata—including the indexing terms applied by the National Library of Medicine (NLM) and provided in MEDLINE—are retrieved and analyzed rather than articles’ full text, which does not need to be obtained or examined. Bibliometric methods have been used to explore the productivity of researchers, institutions, and countries within given subject areas; to examine research trends and emphases in various disciplines; and to guide policy decision making.4–8
In a 2007 article, Todres and colleagues9 reviewed the medical education research published in 2004–2005 in two leading general medical journals (BMJ, Lancet) and what they considered to be the top two medical education journals (Medical Education, Medical Teacher). After examining medical education research topics, funding, and randomized controlled trial (RCT) rates in these publications, they argued that the field was stagnant and urgently needed resources to become rigorous and relevant. They described medical education research as being similar to the primary care and health services research published 20 years earlier. Criticisms and debates of medical education research’s rigor and relevance are not uncommon in the literature.10,11
Todres and colleagues’9 study, however, was limited in that its search was narrow and examined only two years of articles published in four journals. It also included only studies with the highest levels of quantitative evidence (RCTs, systematic reviews, and meta-analyses). Thus, its findings may provide a skewed picture of the types of medical education research being published.
In another 2007 article, Baernstein and colleagues12 examined trends in the methods used in undergraduate medical education studies published during 1969 to 2007. They concluded that although the proportion of the literature using robust methods had not increased over time, the overall growth of the literature had resulted in a larger pool of rigorous research on undergraduate medical education.
To build a solid evidence base concerning the rigor and relevance of medical education research, we sought to analyze the literature more comprehensively than it was reviewed in the articles described above and, in doing so, to overcome those articles’ limitations. In this study, we identified the top five general medical journals and top five medical education journals in terms of medical education research production and, using article metadata, analyzed the types of empirical, substantive studies they published during a 10-year period. We also explored the consistency and accuracy of NLM indexing of evaluative medical education studies in MEDLINE.
In January 2010, we selected 10 journals (5 each from two disciplines; see “Journal selection,” below) and then conducted a literature search that would enable us to identify and classify evaluative medical education studies from those journals. We downloaded the results of these searches—article metadata, or records—in two sets: records that we were able to classify and records that we were not able to classify on the basis of how the articles were indexed in MEDLINE. We labeled these as the “preclassified” set and “unclassified” set, respectively. We drew random samples of the records from the preclassified set, which we examined to assess the reliability of the NLM medical subject heading (MeSH) classifications provided in MEDLINE.
To establish the 10-journal cohort for this study, we first identified 5 English-language journals in each of the JCR13 categories that most closely matched the fields of “medical education” and “general and internal medicine” (GIM). The specific JCR categories we used were “education, scientific disciplines” (n = 27 journals) and “medicine, general and internal” (n = 133 journals), respectively. The objective criterion we used to select the journals from each category was the number of articles from each journal indexed as education, medical in MEDLINE that had been published in the past decade (January 2001 to January 2010). We tabulated the number of qualifying articles from each journal using GoPubMed,14 a free public search engine that queries PubMed and summarizes search results by frequency for journal, country of first author, and year of publication, among other variables. The query we used—education, medical[MeSH Terms] AND English[lang] AND “published last 10 years”[Filter]—was first developed in PubMed and then rerun in GoPubMed. We verified that retrieval counts were the same in the PubMed and GoPubMed interfaces.
We then obtained each journal’s 2009 impact factor,13 which is a measure reflecting the average number of citations to articles published in that journal.15 We also checked the NLM journals database16 to verify that each journal had been indexed in MEDLINE for the 10-year study period, or since its first issue if the journal had been published for less than 10 years.
Search strategy and identification of learner levels, study designs, and themes
Using Ovid MEDLINE, we searched the journals in each category for articles published January 2001 to January 2010 that were indexed using the MeSH heading education, medical or any of the more specific MeSH terms below it in the MeSH hierarchy: education, medical, continuing; education, medical, graduate; education, medical, undergraduate; clinical clerkship; and internship and residency and teaching rounds. We also paired the MeSH heading education, medical with the MeSH heading faculty, medical. To identify empirical, substantive medical education research—which we defined as medical education articles describing one or more interventions and including at least one evaluative component (hereafter referred to as evaluative studies)—we further restricted the search to records with published abstracts. We also excluded commentaries, editorials, historical articles, and letters, as well as duplicates. Records were downloaded into Reference Manager version 11 (Thomas Reuters, New York, New York). We then analyzed the article records we retrieved as described below.
We used NLM MeSH terms to identify the level of learner targeted by the evaluative studies in our sample. Because the definitions of these terms determine the scope of this study, it is useful to consider how each relevant term is defined in the MEDLINE thesaurus17:
* education, medical, continuing: “Educational programs designed to inform physicians of recent advances in their field.”
* education, medical, graduate: “Educational programs for medical graduates entering a specialty. They include formal specialty training as well as academic work in the clinical and basic medical sciences, and may lead to board certification or an advanced medical degree.”
* education, medical, undergraduate: “The period of medical education in a medical school. In the United States it follows the baccalaureate degree and precedes the granting of the MD.”
* internship and residency: “Programs of training in medicine and medical specialties offered by hospitals for graduates of medicine to meet the requirements established by accrediting authorities.”
* clinical clerkship: “Undergraduate education programs for second-, third-, and fourth-year students in health sciences in which the students receive clinical training and experience in teaching hospitals or affiliated health centers.”
* faculty, medical: “The teaching staff and members of the administrative staff having academic rank in a medical school.”
As NLM indexers examine articles during the indexing process, they assign the most specific MeSH heading(s) available.18
To identify evaluative studies, we included in our search the MEDLINE study classifications for clinical trials, systematic reviews, qualitative studies, comparative studies, evaluation studies, and program evaluations. We used the MeSH headings19 comparative study, evaluation studies, and program evaluation to “preclassify” retrieved records into those categories. We used publication type tags20 to preclassify clinical trials, and the systematic review subset21 to preclassify systematic reviews (including meta-analyses). We preclassified qualitative studies using Ovid MEDLINE’s “optimized” search filter; subsequently, in an effort to improve classification performance, we used the more stringent “qualitative studies (specificity)” filter available in MEDLINE.22 These filters are predefined and validated search strategies designed to find qualitative studies; variants focus on complete identification (the sensitive filter), high percentage of relevant retrievals (the specific filter), or a balance of sensitivity and specificity (the balanced filter).
Any retrieved records that did not fit one or more of these study types were considered unclassified.
We were interested in determining how many of the evaluative studies used quantitative versus qualitative approaches and whether study types differed across medical education and GIM journals. We assumed that the studies indexed as clinical trials, including RCTs, were quantitative, and that those retrieved by the qualitative search filters were qualitative. We had no a priori assumptions about whether the studies that were preclassified in the other study categories were quantitative or qualitative in nature.
We initially attempted to identify study themes using subheadings applied to the MeSH term education, medical: classification (cl), economics (ec), education (ed), ethics (es), history (hi), instrumentation (is), legislation & jurisprudence (lj), manpower (ma), methods (mt), organization & administration (og), psychology (px), standards (st), statistics & numerical data (sn), supply & distribution (sd), and trends (td). On inspection, we determined that many of these categories had few articles associated with them, so we adopted a simpler taxonomy. We considered articles to have a theme of “clinical or professional competence” if they were indexed with the MeSH heading clinical competence. We considered studies to have a theme of “communications, interprofessional relations, or patient/physician relations” if they were indexed with any of the following MeSH headings: communication, interprofessional relations, physician–patient relations, physician’s role, or interpersonal relations.
Search strategy development.
The search strategy was developed by the first author (M.S.) to identify and preclassify evaluative studies (for the full search strategy, see Supplemental Digital Appendix 1, http://links.lww.com/ACADMED/A117). This strategy was peer reviewed by a second librarian using the Peer Review of Electronic Search Strategies standard.23
Review of classification accuracy
We reviewed a random sample of the retrieved preclassified records to confirm that they actually represented evaluative medical education studies and to determine the accuracy of the preclassifications. Using R software for statistical computing,24 the first author (M.S.) selected a 10% random sample of preclassified records from the medical education journals and a 20% random sample of preclassified records from the smaller GIM journal set. She combined these samples to create a bibliography that included each record’s title, indexing terms, and abstract; it did not include journal or author information.
We performed a calibration exercise in which all three authors reviewed 50 of these preclassified records. We compared our opinions concerning classifications and settled any disagreements by discussion and eventual consensus. The bibliography for the preclassified sample was then reviewed by one of the other authors (T.H. or A.D.) to confirm that records had been classified correctly.
Articles in the preclassified sample that were not confirmed as evaluative studies were assigned by the reviewing author to one of the following categories: descriptions of medical education initiatives without apparent evaluative components, surveys, narrative reviews of medical education, commentaries on the state of medical education (past, present, or future), studies of reliability or validity of instruments or techniques, or a residual category of “other/could not characterize.”
We obtained frequencies from Ovid MEDLINE search results, GoPubMed statistics, or Reference Manager query results. We calculated the positive predictive value (PPV) of classifications as the proportion of preclassified studies that we confirmed to be correctly classified on examination. We performed statistical analyses and sampling using R.24 Bibliometric frequency distributions are typically skewed; for example, a few journals will publish a large number of articles on a certain topic, whereas a large number of journals will publish only a few articles on that topic.25 Therefore, we used medians as the measure of central tendency. We used chi-square analysis to test for statistically significant differences across distributions. We considered a P value < .05, two-sided, to be statistically significant.
Our search initially identified 107,038 articles that had been indexed in MEDLINE using the MeSH heading education, medical; of those articles, 36,475 (34.1%) had been published within the 10 years prior to our January 2010 search date. Of the 36,475 articles that met our publication date criterion, 20,541 (56.3%) had abstracts. After we further restricted the search to the 10 journals of interest (see below), we retrieved records for 4,418 articles (12.1%). Articles published in medical education journals accounted for 3,853 (87.2%) of the retrieved records, whereas the other 565 (12.8%) records were for articles published in GIM journals. There were 2,669 (60.4%) preclassified and 1,749 (39.6%) unclassified records.
Most productive journals
Our GoPubMed search results showed that the five medical education journals that produced the most records indexed as education, medical during January 2001 to January 2010 were Academic Medicine, Medical Education, Medical Teacher, Teaching and Learning in Medicine, and Journal of Continuing Education in the Health Professions (Table 1). These journals had a median 2009 impact factor of 1.33. The five GIM journals that produced the most records indexed as education, medical during the same period were BMJ, Journal of General Internal Medicine, Medical Journal of Australia, Lancet, and JAMA. These journals had a median 2009 impact factor of 13.66.
Evaluative studies appeared to comprise 59.5% (336/565) of the GIM journal records and 60.6% (2,333/3,853) of the medical education journal records (Table 2). For purposes of comparison, we included the distribution of papers indexed in MEDLINE as human. (NLM describes research subjects or participants of biomedical articles using check tags in MEDLINE. The term human is applied if the study involves human beings; other check tags include animal, in vitro, male, female, and age groups.26)
Medical education journals appeared to publish higher proportions of qualitative studies and program evaluations than did GIM journals, whereas GIM journals appeared to publish higher proportions of clinical trials and systematic reviews than did medical education journals (χ2 = 74.2815, df = 3, P < .001).
RCTs were infrequent in this sample: Only 186 RCTs were identified, comprising 4.2% of the sample’s 4,418 records (Table3). Of the 10 journals included in this study, BMJ demonstrated the highest concentration of RCTs (12/48; 25%).
Study themes and learner levels
We found that the theme of clinical and professional competence was more common than the theme of communication, interprofessional relations, or physician–patient relations (Table 4). The medical education initiatives described most commonly targeted undergraduate medical students (with a concentration in the medical education journals) or interns and residents (with a concentration in the GIM journals); however, the entire medical education continuum was represented to some degree.
Accuracy of classifications
After examining for our random sample of 300 preclassified records, we confirmed 170 records (56.7%) as evaluative studies of medical education initiatives. The accuracy of the preclassifications varied by study design (Table 5). We agreed with the MEDLINE classification of all 26 RCTs (PPV = 1.00) but found the clinical trial classification to be less accurate (41/48; 85.4%; PPV = 0.85).
Of the 130 records that we did not confirm as evaluative studies, 97 (74.6%) had been preclassified as qualitative studies when we used the “optimized” qualitative search filter in Ovid MEDLINE. When we tested the more stringent “qualitative studies (specificity)” filter22 in MEDLINE, we found it to be much more accurate: Only 12 (12.4%) of these 97 studies were retrieved and preclassified as evaluative studies, raising the PPV for the classification of qualitative studies from 0.50 to 0.75.
Among the 130 preclassified records not confirmed as evaluative studies, we categorized 36 (27.7%) as narrative reviews, 36 (27.7%) as descriptions of medical education initiatives without any apparent evaluative component, 43 (33.1%) as surveys (these often evaluated satisfaction with medical education initiatives), 9 (6.9%) as commentaries on the state of medical education, and 3 (2.3%) as studies of the reliability or validity of instruments or techniques designed to measure the impact of medical education initiatives. We labeled the remaining 3 (2.3%) records as “other/could not characterize.”
In this study, we established a comprehensive bibliometric profile of evaluative medical education studies published in two broad categories of journals over a 10-year period. To add rigor, we systematically assessed a sample of retrieved article records to evaluate the accuracy of their classifications in MEDLINE. We believe this study represents the most robust and rigorous bibliometric description of the medical education literature to date.
One hallmark of a specialty is that it has its own literature and its own journals.27 The integrity of medical education as a specialty was confirmed by our finding that evaluative medical education studies were concentrated in medical education journals. Other hallmarks of an established discipline of research are the existence of its own knowledge community and its own methods of inquiry.28
Whereas evaluative medical education studies were concentrated in the journals of the specialty, the GIM journals published a higher proportion of clinical trials than did the medical education journals, whereas the medical education journals seemed to place more emphasis on qualitative research than did the GIM journals. This is an important finding for medical education researchers seeking to publish qualitative articles outside their discipline’s specialty journals; they should be aware of these tendencies and be prepared to build a strong case for their work. We were not, however, able to determine whether these patterns were driven by researchers’ preferences or by journal editors (e.g., by frequently rejecting or accepting a certain type of study). This is an area of ongoing investigation by our research group.
RCTs are the second-most-highly cited type of article, following systematic reviews,29 and have long been viewed as one of the highest levels of evidence for single studies of effectiveness (meta-analyses of RCTs are viewed as the highest). Although RCTs made up only 4.2% of records within our study sample, our MEDLINE search for records indexed as human published in all journals during the same time period found that just 4.3% of such articles were indexed as RCTs. We did not assess the quality of the RCTs we found, but that may be a topic for future research. Findings from a related project,30 focused on publications in the field of neurology, demonstrated that medical education RCTs in neurology had design flaws that prevented them from attaining the American Academy of Neurology’s highest grade of evidence. Further, many of the RCTs identified in that study30 were designed to evaluate learner-focused outcomes (e.g., knowledge, satisfaction) rather than to measure the impact of interventions on patient care. Baernstein and colleagues’12 study of undergraduate medical education research also found a focus on learner outcomes. Thus, there may be room for more robust RCTs in the field.
Although there were no apparent differences in the publication patterns of the two main themes we examined (clinical or professional competence and communication, interpersonal relationships, or patient–physician relationships), these themes occurred in about half the articles in our sample. Other thematic variations may exist. We did identify empiric and important differences in the educational level targeted by interventions: Medical education journals showed a concentration of studies in the undergraduate sphere, whereas GIM journals showed a concentration of studies on interns and residents.
Our findings regarding the reliability of NLM indexing in MEDLINE are important for individuals searching for medical education studies (e.g., research librarians, systematic review authors). We found that the precision (PPV) of the MeSH terms associated with education, medical was very good, as was indexing of certain study designs. RCTs, clinical trials, evaluation studies, and qualitative studies (when using the specificity filter for the latter) had precision of 0.75 or better. Because we looked only at studies indexed as medical education, we cannot report on the sensitivity, or recall, of the MeSH headings. Systematic reviewers should use multiple methods with a high degree of redundancy to ensure that they identify all relevant studies. They should also consult an experienced research librarian, who may design a search that goes beyond the MeSH terms we used in this study to include the MeSH subheading for education, abbreviated “ed,” either in association with specific MeSH headings or as a free-floating subject heading.31 An exhaustive search would extend to additional databases beyond MEDLINE or PubMed.32 For example, Baernstein and colleagues’12 systematic examination of undergraduate medical education research found most studies by searching PubMed (which includes all the material in MEDLINE) but identified additional studies through searches of the Cochrane Controlled Trial Registry (CENTRAL) and the Campbell Collaboration.
A limitation of this study’s use of bibliometric methods is that we were able to investigate only journal publications. One could argue that education research, as a social science, is more theoretically based than medical research. As such, education research may be more likely than medical research to be published in other formats (e.g., books, book chapters, dissertations) and still contribute significantly to the knowledge base of the field.33 We believe, however, that in the field of medical education, most evaluative studies will be published in journals rather than in other formats and, thus, that our bibliometric approach presents a valid picture of published research activity in the field.
Another limitation is that, in keeping with bibliometric methods, we did not examine individual article records, except those in the random sample we examined to verify the accuracy of indexing. Rather, we relied on indexing in MEDLINE for classifications. Answering questions about the effectiveness of particular types of medical education interventions (e.g., distance education) would require a study that employed full systematic review methods. We decided that such an approach would provide little additional meaningful information in examining the field of medical education as a whole, as we have attempted to do in this study.
One final limitation of our study is the fact that we limited our search to evaluative studies of medical education interventions, omitting commentaries, editorials, and narrative reviews. In medical education, these types of articles are often written by thought leaders and innovators in the field. As such, they have the potential to influence present and future medical education research directions.
Our analysis demonstrated that medical education has its own literature and journals and, therefore, meets at least one of the criteria for specialties.27 However, we found that a higher proportion of medical education studies with the most robust research designs appeared to be published in GIM medicine journals than in the specialty’s journals. Researchers who wish to explore the medical education literature can place confidence in the accuracy of NLM indexing, but they should be aware that searches using just the MeSH heading education, medical will not identify all studies evaluating medical education interventions.
Acknowledgments: The authors wish to thank Becky Skidmore for peer reviewing the search strategy they used to identify and classify medical education articles; Maureen O’Connor, research assistant, for assistance with manuscript preparation and submission; and the anonymous reviewers for valuable comments on the manuscript.
Other disclosures: None.
Ethical approval: Not applicable.