Journal editors rely on peer review to maintain high quality and standards in the papers they accept for publication, and researchers and educators rely on peer-reviewed journals as sources of high-quality research in their fields. Reviewers typically assess the quality of manuscripts according to two main criteria: “contribution to the field” and the “adequacy of the research design.”1 There is a growing body of research on journal peer review. For example, JAMA has dedicated three complete issues in the past decade (March 9, 1990; July 13, 1994; and July 15, 1998) to peer review studies and essays, and the Council of Biology Editors (now the Council of Science Editors) also published a book of papers in 1991 from the First International Congress on Peer Review in Biomedical Publishing.2 However, few studies exist that analyze the content of reviewers' comments when reviewers are recommending rejection or acceptance of a manuscript.
Gilbert and Chubin1, p. 108 conducted such an analysis for a sample of reviewers' comments on manuscripts that had been rejected from Social Studies of Science, an interdisciplinary specialty journal. They found that the most frequent reason reviewers offered for rejecting a manuscript was “poor argumentation,” that is, failing to make a convincing case. Other reasons frequently given included poor writing, ignorance of the literature, lack of novelty, and misunderstanding or misapplying the data or the literature. While reviewers do write comments, the level of agreement among reviewers remains highly variable. Assessing studies from different areas of science, Chubin and Hackett reported poor agreement among reviewers, with inter-rater reliability in the 0.25 range.1 The research confirms what editors have known: Reviewers for any given manuscript focus on different issues.
Research into understanding the problems that peer reviewers identify in research reports has barely begun. Thus, the goal of this study was to better understand the nature of the strengths and weaknesses in medical education reports by analyzing the ratings and comments made by external reviewers who recommended either rejection or acceptance. A descriptive content analysis was performed on reviewers' comments. The results should inform editors, reviewers, and authors of frequent and important reasons reviewers offered for rejecting and accepting manuscripts. These findings should also alert researchers to major flaws in conducting solid research.
One data set was used for this study: reviews of research manuscripts submitted in 1997 and 1998 for the Research in Medical Education (RIME) conference sponsored annually by the Association of American Medical Colleges (AAMC). The RIME manuscripts are peer reviewed by medical educators worldwide, and those that are accepted are subsequently published in a supplement to the October issue of Academic Medicine. The manuscripts are masked and each is sent to four or five reviewers, who write anonymous comments to the authors. The reviewers use a review form to evaluate each manuscript in eight areas: Problem Statement and Background, Research Design, Sampling, Instrumentation and Data Collection, Results, Conclusion, Writing, and Importance. Each area is rated on a five-point scale (excellent, good, fair, unsatisfactory, and not applicable). The reviewers are also asked to provide a global rating using a four-point scale (definitely include; acceptable, probably include; questionable, probably exclude; definitely exclude) and to write detailed comments on the merits or shortcomings of the manuscript. Historically, the acceptance rate for RIME papers is about 50%.
Comments on a total of 151 manuscripts were used in this study: 82 manuscripts submitted in 1997 and 72 manuscripts submitted in 1998 (three were withdrawn, bringing the total to 69 manuscripts for that year). The contents of reviews for all the manuscripts that received “questionable, probably exclude” or “definitely exclude” overall ratings from at least one of the reviewers were analyzed to identify the natures of the flaws. Conversely, the contents of reviews for all the manuscripts that received unanimous approval (“definitely include” or “acceptable, probably include”) were analyzed to identify the positive aspects of accepted papers.
Lists of the reasons the reviewers gave for both positive and negative comments were developed as the comments were analyzed. A broad categorization scheme was used to tally the reasons based on ten major categories: Problem Statement (including background and literature review), Relevance, Research Design, Sample and Sampling, Instrumentation and Data Collection, Results, Discussion and Conclusion, Title, Abstract, and Writing and Presentation. Thus, the product of the content analysis was two lists, one containing the reasons given by the reviewers who recommended that manuscripts be rejected (the negative comments) and one containing the reasons for acceptance (the positive comments). For manuscripts that were recommended for rejection, only the comments of reviewers recommending rejection were analyzed.
The analysis was performed solely by the author. To avoid categorization bias, the ratings and comments were analyzed in a staggered fashion according to years (i.e., 20 from 1997 followed by 20 from 1998, etc.). Reasons (comments) were tallied only once per reviewer. Whenever a comment (e.g., “questionable randomization”) could belong to more than one category of reason (e.g., research design or sampling), the comment was assigned to the category best dictated by the context in which it appeared.
Institutional review board (IRB) approval was requested and granted.
On average, in the two periods combined, 4.1 reviewers evaluated each manuscript (SD = 0.97, range 2–6). The figures were 3.9 (SD = 1.1) and 4.2 (SD = 1.08) for 1997 and 1998, respectively. Reviewers were unanimous in recommending acceptance of 28 of the 151 papers (19% overall; 23% and 13% in 1997 and 1998, respectively). At least one reviewer recommended rejection of the remaining 123 manuscripts (81% overall; 77% and 87% in 1997 and 1998, respectively). In the end, the RIME committee accepted 83 papers for presentation in Academic Medicine (55% overall; 56% and 54% in 1997 and 1998, respectively); 55 of the 123 manuscripts that received at least one cautionary exclusion initially were revised by the authors and finally accepted for publication. On average, slightly over half of the reviewers (2.3 of 4.1 = 56%) recommended rejection of a manuscript. Of the 123 manuscripts receiving at least one recommendation for rejection, 15% were rejected by all the reviewers (unanimous decision), 34% by a majority of reviewers, 11% by half of the reviewers, 10% by a minority, and 30% by a single reviewer. Of the reviewers recommending rejection, 38% overall (29% in 1997 and 52% in 1998) did not rate any of the eight review categories provided on the review form as “unsatisfactory,” thus demonstrating the importance of analyzing the comments.
Overall, 1,053 negative comments were made, and each reviewer wrote an average of 8.1 (SD = 5.7, range = 1–30) reasons why the manuscript was questionable or unacceptable. During the content analysis, it was not possible to distinguish major reasons or “fatal flaws” from minor reasons; some of the negative comments were definitely less important than were others and were stated mostly in an educational spirit to help the authors. When a majority of reviewers recommended rejection, the number of negative comments overall doubled (from approximately six negative comments to 12) compared with when fewer than half of the reviewers or a single reviewer recommended rejection.
The numbers and percentages of reasons given by the reviewers for rejecting manuscripts are presented in Table 1 according to broad categories and by years. Almost three fourths of the negative comments written by the reviewers (70.1%) were categorizable to 20 reasons (see Table 2). The complete list of reasons and negative comments is presented in Appendix A.
Twenty-eight manuscripts were judged acceptable by all the reviewers, that is, they received “definitely include” or “acceptable, probably include” ratings: 39% of these manuscripts received “definitely include” ratings from a majority of reviewers, and 4% by half of the reviewers; 57% of the manuscripts received “acceptable, probably include” ratings. Three fourths of the positive comments written by the reviewers (76%) were contained in nine reasons (see Table 3). The complete list of positive comments is presented in Appendix B.
That nearly two fifths of the reviewers in this study recommended rejection of manuscripts but provided no unsatisfactory ratings on the review form's checklist certainly reinforces editors' requests for reviewers to provide written comments in addition to numerical ratings. Without such comments, neither editors nor authors can know why a manuscript has been recommended for rejection.
The overall patterns of positive and negative comments were quite similar across the two years studied. However, the diversity of the comments made by the reviewers suggested, once again, that they had focused on different aspects of the manuscripts or weighted their objections differently. Consequently, editors should select reviewers in such a way as to strike a balance between content expertise, methodologic expertise, and educational relevance.
Some deficiencies in manuscripts can be fixed before they are accepted for publication, especially when RIME committee members offer their direct help and guidance to authors. However, while some deficiencies can be fixed within a one-to-two-month turn-around time, for example, by rewriting or reanalyzing some data, other deficiencies, such as lack of importance of research topics or inappropriateness of study designs, are likely to be considered “fatal.” The results from the present study point to six major recommendations to researchers and authors: pay attention to relevance (theoretical or practical), select optimal study designs, select optimal instruments, select optimal statistics, interpret the results honestly, and present well-written manuscripts.
The reasons given by the reviewers in this study for rejecting manuscripts confirmed Gilbert and Chubin's list,1, p. 109 but are even more detailed. Also, the reasons for rejecting manuscripts in this study were not simply mirror images of the reasons given for accepting manuscripts. Researchers and authors need to pay attention both to qualities of good studies and good writing (e.g., relevance and well-crafted manuscripts) and to shortcomings of poor studies and poorly written manuscripts (e.g., inappropriate statistics and overinterpretation of the results). A number of strengths identified by the reviewers emphasized the importance of researchers' acknowledging the limitations of their studies (e.g., possible selection biases, lack of power, or low reliability) rather than ignoring these deficiencies. An “honest” approach to design and results in scientific writing, as noted by some reviewers, is likely to increase one's chances of being published. Consequently, researchers need to make a conscious effort to identify possible biases and confounding variables, both during the design phase of the study and once the results are in. Researchers should ask themselves, “What are the competing hypotheses3 or alternative explanations and to what extent can they be controlled or explained?”
Many reviewers raised the issue of quality of writing (good and bad), suggesting that submitting well-crafted manuscripts is vital. Good writing is an important asset in getting one's manuscript accepted, while poor writing is likely to annoy reviewers and decrease the author's chance of getting recommended for publication.
The detailed lists of strengths and shortcomings of medical education manuscripts reported in this study, along with other resources (such as Huth's book on medical writing,4 Bordage's paper on considerations in preparing a manuscript,5 and Parsell and Bligh's guide to writing for journal publication6), can be useful to editors and educators in training or providing advice to researchers and writers. The lists can also help reviewers focus their evaluation of manuscripts on frequent and important shortcomings.
In conclusion, the interdependence of science and the art of writing in producing good manuscripts brings to mind two quotes from Boileau7 that are as true today as they were over three centuries ago when they were written about the art of poetry: “What is clearly understood is well expressed and the words to say it come easily”; and “Twenty times on the stocks put your work, polishing it unceasingly and repolishing it.” Scientific writing demands both conducting good science and writing good manuscripts.
Reasons written by external reviewers (in decreasing order within each category) when recommending rejection of a medical education manuscript submitted to the Research in Medical Education proceedings, 1997 and 1998. There were 1,053 negative comments, 557 in 1997 and 496 in 1998.
Problem statement (184 total negative comments; 105 in 1997 and 79 in 1998)
- Insufficient, confusing, or incomplete description of the problem (41; 25 and 16)
- Inadequate, incomplete, inaccurate, or out-dated review of the literature (33; 14 and 19)
- Intervention (independent variable) insufficiently described or confusing (21; 10 and 11)
- Lack of a conceptual or theoretical framework (19; 13 and 6)
- Research hypothesis not stated or inappropriate (14; 6 and 8)
- Lack of focus, too broad (13; 8 and 5)
- Variables (independent or dependent) not identified or inappropriately labeled (9; 7 and 2)
- Stated purpose never pursued (7; 4 and 3)
- Absence of a problem statement or research question (unable to deduce) (6; 4 and 2)
- Outcome (dependent) variable insufficiently described (6; 4 and 2)
- Inappropriate outcome (dependent) variable (5; 4 and 1)
- Unfounded, unsubstantiated statements (3; 0 and 3)
- Misleading problem statement (3; 2 and 1)
- Unit of measurement not specified (2; 2 and 0)
- Focusing on wrong problem (1; 1 and 0)
- Outdated data (1; 1 and 0)
Relevance, importance (55 total negative comments; 28 in 1997 and 27 in 1998)
- Unimportant, irrelevant topic; adds nothing new (22; 10 and 12)
- Practical implications not established (13; 6 and 7)
- Importance not established or reported (12; 8 and 4)
- Topic too narrow or simplistic (8; 4 and 4)
Research design (62 total negative comments; 27 in 1997 and 35 in 1998)
- Potential confounding variables not addressed (18; 8 and 10)
- No research presented (15; 5 and 10)
- Inappropriate or weak design (13; 10 and 3)
- Comparison group not clearly identified (5; 2 and 3)
- Questionable control group (5; 1 and 4)
- Insufficient or inappropriate timing or strength of intervention (5; 1 and 4)
- Questionable randomization (1; 0 and 1)
Sample and sampling (103 total negative comments; 55 in 1997 and 48 in 1998)
- Sample size too small or biased (59; 29 and 30)
- Subjects insufficiently described (20; 12 and 8)
- Sampling method inappropriate or insufficiently described (15; 9 and 6)
- Population not identified (6; 2 and 4)
- Inappropriate sample (1; 1 and 0)
- Sample too heterogeneous (1; 1 and 0)
- Unequal groups (1; 1 and 0)
Instrumentation and data collection (145 total negative comments; 69 in 1997 and 76 in 1998)
- Inappropriate, suboptimal, or insufficiently described instrument (77; 40 and 37)
- Insufficient or unreported reliability (22; 9 and 13)
- Untested (non-validated) instrument (13; 7 and 6)
- Procedure or time of administration not stated (12; 4 and 8)
- Measurement scale insufficiently described (11; 5 and 6)
- Questionable or inappropriate items on the instrument (4; 1 and 3)
- Scoring method insufficiently described (3; 1 and 2)
- Example needed to understand (judge) the nature of the variable (2; 1 and 1)
- Respondents not anonymous (1; 1 and 0)
Results (214 total negative comments; 105 in 1997 and 109 in 1998)
- Statistics (118; 68 and 50)
- Analysis insufficiently described (30; 16 and 14)
- Inappropriate analysis done (26; 13 and 13)
- Insufficient, suboptimal, or incomplete analysis (25; 18 and 7)
- Analysis not specified (21; 10 and 11)
- Incomplete analysis done (8; 7 and 1)
- Too few subjects for analyses done (6; 2 and 4)
- P values not reported (2; 2 and 0)
- Inconsistencies or inaccurate data (36; 13 and 23)
- Insufficient data presented (28; 9 and 19)
- Tables and figures (26; 12 and 14)
- Insufficient data presented (8; 5 and 3)
- More needed (8; 3 and 5)
- Too many or redundant with text (6; 1 and 5)
- Inappropriate format (2; 2 and 0)
- Too complicated (2; 1 and 1)
- Data appear made up, unbelievable (4; 2 and 2)
- Data interpretation in results section (2; 1 and 1)
Discussion and conclusion (147 total negative comments; 87 in 1997 and 60 in 1998)
- Overinterpretation of results (92; 57 and 35) (including conclusions not supported by data, insufficient evidence, going beyond the data, sample, or outcomes measured, implying causation with observational studies, ignoring confounding variables or limitations)
- Underinterpretation of results (18; 8 and 10)
- Contradictory or conflicting assertions (10; 5 and 5)
- Confusing, out-of-context interpretations (8; 4 and 4)
- Lack of theoretical framework to interpret results (5; 4 and 1)
- Key points, main results don't stand out (4; 3 and 1)
- Deceptive, erroneous interpretation (4; 2 and 2)
- No guidance for future studies (3; 2 and 1)
- Ambiguity between current and past results (3; 2 and 1)
Title (27 total negative comments; 24 in 1997 and 3 in 1998)
- Not representative of the paper (17; 14 and 3)
- Too negative (10; 10 and 0)
Abstract (18 total negative comments; 6 in 1997 and 12 in 1998)
- Incomplete, insufficient information reported (17; 6 and 11)
- Inconsistent with text (1; 0 and 1)
Writing, presentation (98 total negative comments; 51 in 1997 and 47 in 1998)
- Difficult to read, to follow, to understand, confusing; too much jargon (41; 19 and 22)
- Too long (14; 8 and 6)
- Wrong or inaccurate terms (10; 3 and 7)
- Information in the wrong section, poor organization (10; 7 and 3)
- Unedited, hasty writing, typographical errors (8; 4 and 4)
- Grammatical errors (5; 4 and 1)
- Inappropriate language (4; 4 and 0)
- Abbreviations not spelled out (4; 1 and 3)
- Irrelevant anecdotes (2; 1 and 1)
Reasons stated (in decreasing order within each category) by external reviewers when recommending unanimous acceptance of a medical education manuscript (252 total positive comments, 165 in 1997 and 87 in 1998)
Problem statement (36 total positive comments; 20 in 1997 and 16 in 1998)
- Thoughtful, focused, up-to-date review of the literature; grounded, thorough (17; 9 and 8)
- Problem well stated, formulated; excellent background (9; 6 and 3)
- Well conceived (4; 2 and 2)
- Based on sound theoretical, conceptual, or educational framework (3; 1 and 2)
- Clear rationale (2; 1 and 1)
- Clear hypotheses (1; 1 and 0)
Relevance, importance (68 total positive comments; 47 in 1997 and 21 in 1998)
- Important, timely, current, relevant, critical, appealing, prevalent problem (51; 35 and 16)
- Practical, useful implications (11; 7 and 4)
- Contributes to theory building, advancement in the field (4; 3 and 1)
- Understudied topic (2; 2 and 0)
Research design (27 total positive comments; 18 in 1997 and 9 in 1998)
- Well designed; appropriate, rigorous, comprehensive design; novel mix of designs (26; 17 and 9)
- Well described (1; 1 and 0)
Sample and sampling (17 total positive comments; 14 in 1997 and 3 in 1998)
- Sample size sufficiently large (11; 9 and 2)
- Limitations of the sample acknowledged; selection or sample bias verified (4; 4 and 0)
- High response rate (2; 1 and 1)
Instrumentation and data collection (8 total positive comments; 5 in 1997 and 3 in 1998)
- Validity and (or) reliability data reported (4; 3 and 1)
- Innovative scoring method (2; 1 and 1)
- Limitations of the instrument acknowledged (1; 1 and 0)
- Instrument well described (1; 0 and 1)
Results (19 total positive comments; 14 in 1997 and 5 in 1998)
- Novel, unique approach to data analysis; integration of multiple statistical methods (9; 6 and 3)
- Well thought out, appropriate analyses (3; 3 and 0)
- Easily understandable, well presented (3; 2 and 1)
- Clear and easy-to-understand tables and figures; useful, adds to comprehension (3; 2 and 1)
- Sufficient power (1; 1 and 0)
Discussion and conclusion (26 total positive comments; 17 in 1997 and 9 in 1998)
- Interpretation took into account the limitations of the study; self criticism; counter-evidence, alternative explanations presented; reflects scientific honesty (11; 9 and 2)
- Future directions discussed (4; 3 and 1)
- Conclusions flow from results; consistent with results (3; 2 and 1)
- Confirms or extends results from previous studies (3; 2 and 1)
- Practical implications discussed (2; 1 and 1)
- Argumentation well developed, compelling (2; 0 and 2)
- Importance of nonsignificant results (1; 0 and 1)
Title (no comment made)
Abstract (3 total positive comments; 0 in 1997 and 3 in 1998)
- Easy to understand (1; 0 and 1)
- Succinct (1; 0 and 1)
- Statistical data reported (1; 0 and 1)
Writing, presentation (48 total positive comments; 31 in 1997 and 17 in 1998)
- Well written; clear, concise yet sufficient details, straightforward, easy to follow, logical (46; 30 and 16)
- Well organized (1; 1 and 0)
- Good use of examples (1; 0 and 1)
1. Chubin DE, Hackett, EJ. Chapter 4. Peer review and the printed word. In: Chubin DE, Hackett EJ (eds). Peerless Science: Peer Review and U.S. Science Policy. Albany, NY: State University of New York Press, 1990:83–122.
2. Council of Biology Editors. Peer Review in Scientific Publishing. Papers from the First International Congress on Peer Review in Biomedical Publishing. Chicago, IL: Council of Biology Editors, 1991.
3. Chamberlin TC. The method of multiple working hypotheses. Sci Monthly. 1944;59:357–62.
4. Huth EJ. Writing and Publishing in Medicine. 3rd ed. Baltimore, MD: Williams & Wilkins, 1999.
5. Bordage G. Considerations in preparing a publication paper. Teach Learn Med. 1989;1:47–52.
6. Parsell G, Bligh J. AMEE Guide No. 17: Writing for journal publication. Med Teach. 1999;21:457–68.
7. Boileau N. L'Art Poétique. Paris, France: Classiques Larousse, 1674.