The Medical Education Research Study Quality Instrument (MERSQI) was introduced in 20071 and has gained acceptance alongside other checklist tools for the evaluation of, and planning for, various other types of research or reviews. Although the MERSQI is not the only evaluation tool for educational research, it has gained wider adoption than the alternate Newcastle-Ottawa scale.2 The MERSQI consists of six domains (study design, sampling, type of data, validity of evaluation instrument, data analysis, outcomes), each of which carries a maximum score of 3. Five of the domains carry a minimum score of 1. Thus, MERSQI scores can range from 5 to 18.
After its introduction, the MERSQI has been validated3 and used to evaluate the quality of educational research published in other leading journals.4–7 Higher MERSQI scores have been shown to be associated with a higher acceptance rate to competitive peer-reviewed journals3,8 and an increased likelihood of external funding.1
In obstetrics and gynecology, the venue most associated with educational scholarship is the annual combined meeting sponsored by the Council on Resident Education in Obstetrics and Gynecology (CREOG) and the Association of Professors of Obstetrics and Gynecology (APGO). For the past 2 years, the majority of those presented have been published in a special supplement to the journal Obstetrics & Gynecology.9,10 This study was undertaken to describe the quality of educational scholarship presented at a large national conference of obstetrics and gynecology educators.
MATERIALS AND METHODS
The abstracts of the CREOG–APGO annual meetings from 2015 and 2016, published as supplements to Obstetrics & Gynecology,9,10 were reviewed, classified, and scored by a single reviewer (R.P.S.) using the MERSQI scoring system as originally described.1 Each year, a small number of full-text articles expanding on the information presented in oral or poster form is selected for publication through the journal's normal peer-review process. Full-text articles were not also published as abstracts and therefore were only counted once. For consistency, for full-text articles, only the information provided in the abstract was used for scoring.
For this uncontrolled observational study of publicly available (published) information, institutional review board approval was not required. There are no issues for an institutional review board to consider regarding this publicly available material and it is unclear what reviewing body would have jurisdiction for such broadly available information.
Scores were compiled for each MERSQI domain, totaled, and the percent of the maximum possible score over the available domains calculated to allow comparisons for those abstracts in which all MERSQI domains could not be scored. Domain scoring was performed using the descriptors used in the original MERSQI publication (see Table 1 in Reed et al1 for the scoring system). The only domain in which any degree of judgment was required was that of “validity of evaluation instrument.” This domain awards 1 point each for disclosing the internal structure, content, and relationship to other variables for the tools used. Because this level of detail is sometimes absent in abstracts, some level of inference was used in assessing this domain. When in doubt, the point(s) were not awarded. For domains that were inapplicable, or where no determination of the domain's information could be obtained from the abstract, no score was entered.
Each abstract was also classified by type of presentation (oral presentation, poster), origin (academic or community institution), focus of the study (need assessment, tool development, tool evaluation, tool refinement, research with validated tool, or literature review), and setting (undergraduate [UME], resident [GME], faculty [CME], and other). Full-text articles, award winners, and those originating from the APGO Academic Scholars and Leaders program and the Surgical Education Scholars program were also noted.
Comparisons between types of submissions (oral presentations compared with posters), the origin of the report (academic compared with community), setting (undergraduate, resident, faculty), and the focus of the study (needs assessment compared with tool evaluation, tool development compared with evaluation) were made. Award-winning presentations and full manuscripts were compared with the remaining abstracts as well as the difference between those originating from the APGO faculty development programs (APGO Academic Scholars and Leaders, Surgical Education Scholars). Abstracts from 2015 were also compared against those published in 2016. We hypothesized that projects resulting in published manuscripts, winning awards at the conference, selected for oral presentations, originating from academic institutions, emerging from APGO’s faculty development programs, and evaluating a tool or curriculum would score higher than their comparators.
To assess the reliability of scoring and classification, a sample of 10% of abstracts was selected by random number generator with half taken from each year's supplement. These were scored in a blinded manner by a second review (L.A.L.). The MERSQI domain scores by each reviewer were compared and a correlation coefficient calculated. The distribution of MERSQI scores was determined to be effectively normal (second-degree polynomial, R2=0.79) allowing one- and two-tailed Student t test with a two-sample unequal variance (heteroscedastic) test with a significance threshold of P≤.05 to be performed. One-tailed tests were used for unidirectional hypotheses. An assessment of effect size (the mean difference in relation to the pooled SD; Cohen's d) was calculated where a statistically significant difference was found. For all comparisons, except where noted, only those abstracts with all six MERSQI domains scored were used.
A total of 186 abstracts and articles were available for MERSQI scoring. Of the total, 178 abstracts could be scored in all six of the MERSQI domains. The correlation of the scores by the separated reviewers for the 10% random sample was 0.69 (covariance 0.55, difference 5.6% of maximum score) with virtually all the variation originating from the most subjective domain, validity of evaluation instrument. The average MERSQI score for the 178 was 9.0 (±1.9, 50.3% of possible) with scores ranging from 5 to 13.5 (median 9). The distribution of the abstracts across the classifications used is shown in Table 1.
No significant differences were found by year of publication. There were 101 posters and 77 oral presentations with full MERSQI scores, but there was no significant difference in the MERSQI scores between the two groups (posters 9.0, oral presentations 9.1). Abstracts from full-text articles scored more than 1 point higher than other abstracts (10.2 compared with 9.0, P<.001, Cohen's d=0.72). Significant differences of smaller magnitude were found in comparisons between abstracts originating from academic compared with community settings and between needs assessment and tool development (Cohen's d=0.34). No statistically significant differences were found between undergraduate and resident settings, needs assessment and evaluation studies, or between the APGO Academic Scholars and Leaders and Surgical Education Scholars projects, separately or together. Averages, SD, and t test results for these comparisons are summarized in Table 1.
Projects originating from “community” sources scored statistically higher than those of “academic” origins, although the effect size was small (Cohen's d=0.21). Exploratory analyses identified the source of this difference. The “community” projects more often received points for using objective outcome measures (19/29 [65.5%] of “community” abstracts compared with 61/147 [41.5%] of “academic”). This resulted in an average domain subscore of 2.3 (±1.0) for the “community” abstracts compared with 1.8 (±1.0) for the “academic” submissions. There were negligible differences in the subscores for the other five domains.
Clinicians have adopted evidence-based practices, but as medical educators, “we have not adopted a similar stance on ‘evidence-based education’ (EBE), instead continuing to use apprenticeship, didactic and other teaching models from the past. Evidence of failings for these methods and the strengths of newer learning paradigms have been ignored.”11 The MERSQI scores found in our sample of published abstracts identify room for improvement. A systematic review including 26 studies using MERSQI to appraise the quality of published educational research found a median overall score of 11.3 (range 8.9–15.1).2 Other published articles using the MERSQI score have generally associated scores of 12 or greater to be associated with “quality” research, publication and funding.1,3,8 Only 18 abstracts in our sample scored at or above this threshold (10%).
The average MERSQI score in our study of meeting abstracts (9.05 [±1.90]) fits within the range of findings from similar studies in other specialties. An analysis of 100 manuscripts submitted for publication in the education issue of an internal medicine journal found significantly higher MERSQI scores for accepted manuscripts than rejected manuscripts (10.7 [±2.5] compared with 9.0 [±2.4], P=.003).3 Our findings showed a similar pattern of scores comparing abstracts leading to full-text publications in the supplement to Obstetrics & Gynecology with other meeting abstracts (10.2 [±1.8] compared with 9.0 [±1.9], P<.001).
Although our study included a large number of educational projects, we advise caution in extrapolating beyond the current sample. First, not all educational scholarship projects result in abstract submissions to the CREOG–APGO annual meeting. We did not investigate the quality of educational studies published in full-text articles in obstetrics and gynecology journals or educational projects presented at other meetings. Second, we did not include abstract submissions that were not selected for inclusion in the CREOG–APGO meeting. In 2015, 294 abstracts were submitted, and 342 were submitted for the 2016 meeting. Lastly, of those selected for presentation, not all authors completed the necessary steps to have their abstracts published (eg, correct formatting, disclosure statements). Because the MERSQI tool was not used in abstract screening and selection, we can only speculate that submissions lost at each of these selection steps represent poorer quality work than the successful abstracts.
Although a 10% sampling of abstracts demonstrated reproducible scoring, scores assigned in this (abstract) setting may be lower than might result if a full-text exposition of the research were available. For the 12 full-text articles published in these supplements, that was not the case, although the number of instances is too small to justify any sweeping statement. There is, however, precedence for the application of the MERSQI in the setting of abstracts7 where it appeared to give reliable results. Nonetheless, it may be unfair to judge the true quality of the abstract author's efforts in the absence of the richness that a full text could afford.
The strengths of this study include its large sample size and its novelty in being the first to apply the MERSQI quality assessment tool for educational research studies to our specialty (Medline search term “MERSQI” yields only 32 results since the 2007 introduction of the scale, none dealing with obstetrics or gynecology). The consistency of the results (values and variation) found with those from other disciplines3–8 supports the use of the MERSQI as a benchmarking tool to inform mentoring and faculty development efforts in educational scholarship.
We believe that the MERSQI scoring tool is reproducible and has been sufficiently validated in many settings that it can, and should, now become a standard tool for the planning of educational research projects and the evaluation of scholarly output from such studies. The Medical Education Research Study Quality Instrument has high potential to help investigators design and conduct better studies through careful attention to study design, sampling, type of data, validity of an evaluation instrument, data analysis, and outcomes. A few simple changes in the design of a project can materially improve the quality, value, and potential effect of the work, leading to increased opportunities for dissemination in peer-reviewed publications. We believe that a wider awareness of tools such as the MERSQI can assist educators, peer reviewers, meeting planners, faculty development leaders, and consumers of educational research in their mission to improve the quality of teaching and learning in obstetrics and gynecology.
1. Reed DA, Cook DA, Beckman TJ, Levine RB, Kern DE, Wright SM. Association between funding and quality of published medical education research. JAMA 2007;298:1002–9.
2. Cook DA, Reed DA. Appraising the quality of medical education research methods: the Medical Education Research Study Quality Instrument and the Newcastle-Ottawa Scale–Education. Acad Med 2015;90:1067–76.
3. Reed DA, Beckman TJ, Wright SM, Levine RB, Kern DE, Cook DA. Predictive validity evidence for medical education research study quality instrument scores: quality of submissions to JGIM's Medical Education Special Issue. J Gen Intern Med 2008;23:903–7.
4. Reed DA, Beckman TJ, Wright SM. An assessment of the methodologic quality of medical education research studies published in The American Journal of Surgery. Am J Surg 2009;198:442–4.
5. Windish DM, Reed DA, Boonyasai RT, Chakraborti C, Bass EB. Methodological rigor of quality improvement curricula for physician trainees: a systematic review and recommendations for change. Acad Med 2009;84:1677–92.
6. Cook DA, Levinson AJ, Garside S. Method and reporting quality in health professions education research: a systematic review. Med Educ 2011;45:227–38.
7. Eaton JE, Reed DA, Aboff BM, Call SA, Chelminski PR, Thanarajasingam U, et al. Update in internal medicine residency education: a review of the literature in 2010 and 2011. J Grad Med Educ 2013;5:203–10.
8. Sawatsky AP, Beckman TJ, Edakkanambeth Varayil J, Mandrekar JN, Reed DA, Wang AT. Association between study quality and publication rates of medical education abstracts presented at the Society of General Internal Medicine Annual Meeting. J Gen Intern Med 2015;30:1172–7.
9. Research presented at the Council on Resident Education in Obstetrics and Gynecology and the Association of Professors of Gynecology and Obstetrics annual meeting: March 4–7, 2015, San Antonio, Texas. Obstet Gynecol 2015;126(suppl 2):1–60S.
10. Research presented at the Council on Resident Education in Obstetrics and Gynecology and the Association of Professors of Gynecology and Obstetrics annual meeting: March 2–5, 2016, New Orleans, Louisiana. Obstet Gynecol 2016;128(suppl 2):1–64S.