Consensus group methods are used extensively in several fields of inquiry including business; public policy; science and technology; and health care research in medicine, nursing, and health services.1–3 Despite the popularity of these methods, little critical attention has been paid to the ways in which they are used in medical education research. Without further analysis, we do not know whether these methods are being used appropriately.
Consensus methods are defined as a systematic means to measure and develop consensus.2 They are particularly useful when empirical evidence is lacking, limited, or contradictory. Consensus methods are based on the premise that an accurate and reliable assessment can be best achieved by consulting a panel of experts and accepting the group consensus.1,2 In medical education, there are several important areas of inquiry that are plagued by high levels of uncertainty and a limited evidence-based literature. Consequently, consensus methods are particularly relevant to medical educators because of their presumed capacity to extract the profession’s collective knowledge, which is often tacit and difficult to verbalize and formalize.1,2
Two commonly used consensus methods are the Delphi method and the nominal group technique (NGT). A third method, RAND, developed by the RAND Corporation and the David Geffen School of Medicine at UCLA, is a hybrid of the Delphi and NGT methods.4Table 1 highlights some of the differences between these three approaches.
The Delphi method involves six stages: (1) identifying a research problem, (2) completing a literature search, (3) developing a questionnaire of statements, (4) conducting anonymous iterative mail or e-mail questionnaire rounds, (5) providing individual and/or group feedback between rounds, and (6) summarizing the findings. This process is repeated until the best possible level of consensus is reached or a predetermined number of rounds are completed. Participants never meet or interact directly.1,2 Benefits of the Delphi method are its capacity to include a large number of participants who are geographically dispersed, its relatively minimal support structure needs (thus making it relatively inexpensive), and the avoidance of undue dominance by particular individuals through anonymity. Conversely, the number of modifications that can be implemented to the Delphi method has led to considerable confusion surrounding its application and outcomes.5,6
The NGT is a structured face-to-face interaction usually involving 5 to 12 participants. It has the following steps: (1) formulation and presentation of the nominal question, (2) private idea generation, (3) round-robin feedback from group members to record each idea, (4) structured group discussion with a skilled moderator, (5) subsequent private ranking, and (6) anonymous feedback to the entire group. Additional group discussion and private voting may take place if necessary to reach consensus.1,2,7 The main advantages of the NGT are the potential to discuss and debate topics lacking consensus and the opportunity for more robust idea generation. The disadvantages are the smaller number of participants and that dominant participants may unduly influence the group.
The RAND approach is a hybrid that starts like the Delphi method but subsequently involves a face-to-face meeting.4,7 Advantages include both the potential to survey a geographically disperse group and the opportunity for clarification and discussion, recognizing that, as with the NGT at this stage, the numbers of participants are limited and dominant participants may sway the group.
Although there are variations in consensus methods, they are all considered structured interactions and share several foundational principles that distinguish them from informal consensus meetings. These foundational principles include anonymity, iteration, controlled feedback, statistical group response, and structured interaction.1,8 Anonymity prevents dominant individuals from swaying the group. Iteration, controlled feedback, and statistical group response enable individuals to see the responses of other group members in relation to their choice and to re-rank the items if they choose. These principles are important for facilitating consensus or highlighting disagreement by enabling individuals to see the opinions of the other participants and to revisit their original decisions.
Despite the extensive use of consensus group methods, these research approaches are poorly standardized and inconsistently used and described in several fields.5–7,9–11 Many authors have noted poor agreement in the definitions of the methods with an ever-expanding list of modifications; for example, the Delphi method now has several modifications such as the classical Delphi, policy Delphi, decision Delphi, Modified Delphi, and iterative Delphi, to name a few.7,12 Additionally, critics have described a surprising lack of detail in published research regarding the processes used for generating items, reducing the number of items, or deciding on the type of feedback to share with participants.5 Several recent reviews and reports further highlight significant deficiencies in the methodology and reporting of the Delphi method in multiple fields of study.10,11,13–17 These reviews observed that studies often fail to describe the selection and description of participants, information provided to the participants at the start of the process, response rates for all rounds, formal feedback of group ratings, levels of anonymity enforced, participant attrition rates across iterative rounds, outcomes after each round, and the end-point decisions (i.e., how it was determined that consensus was achieved).
In sum, there is poor standardization and insufficient detail in consensus methods reporting. While these approaches are used in medical education research, the extent and quality of their use is unknown.18 The overall objective of this study was to describe the use of consensus methods in medical education research and to assess the reporting quality of these methods and results.
To achieve these research objectives, we relied on scoping review methods. We used the framework described by Arksey and O’Malley19 and included recommendations from Levac and colleagues.20 Thus, we followed a four-step process.
Step 1: Identifying the research question
As suggested by Levac and colleagues,20 we combined a broad research question with a more precisely articulated scope of inquiry and envisioned the content and format of the intended outcome at the beginning of the study to help determine the purpose. Previous work by Boulkedid and colleagues10 and by Sinha and colleagues11 enabled such envisioning. Our study had two broad purposes: (1) to describe the use of consensus methods in medical education research, and (2) to assess the reporting quality of these methods and results. More specifically, we addressed the following four questions: (1) How extensively are consensus methods used in medical education research? (2) What types of consensus methods are used in this literature? (3) What are the purposes of the research that uses consensus methods? (4) Is there standardization in the application of consensus methods?
Step 2: Identifying relevant articles and article selection
We decided at the outset of this investigation to include in our review any article related to consensus methods in medical education. A medical librarian (L.U.) completed all the literature searches for the project. An initial pilot keyword search was completed in August 2013 in the Medline and Embase databases, focusing on medical education and the following keywords—“Delphi,” “RAND,” “nominal group,” and “consensus group methods”—for the years 2003 to 2013. These keywords were selected based on a review of the published literature1,7,10,11,13 and yielded 149 abstracts.
A bibliometric analysis was conducted next to determine when consensus methods gained prominence as a research method in this literature. The analysis was conducted using the Medline, Embase, and PsycInfo databases. The body of medical education literature that described consensus methods did not appear until 1955. The analysis indicated that the search terms for consensus meth ods that we used in our pilot keyword search appeared in fewer than 10 articles per year until 2000. More than 60% of the relevant results were published between 2009 and 2013. The numbers of articles per year were 27 (2009), 25 (2010), 29 (2011), 43 (2012), and 22 (2013). We felt that the most recent literature would be considered more relevant to medical educators who are interested in consensus methods research and that a review of the literature from 2009 to the present would provide a sufficient sampling of articles to address our research questions.
We executed our third search in the Medline, Embase, PsycInfo, PubMed, Scopus, and ERIC databases for the years 2009 to January 2014, when the search was conducted, using a combination of keywords and controlled vocabulary terms. The principal investigator (S.H.M.) and a research assistant under her supervision reviewed all titles and abstracts. To be included in this study, the article must have been written in English; must have been related to education in any health profession; and must have noted in the title or abstract the use of consensus group method(s), including the Delphi method, any modification of the Delphi method, the NGT, or any “consensus” methodology. At this stage, we included all health professions. Any ambiguous abstracts were included for full-text review.
Five team members (S.H.M., L.V., T.J.W., C.G., L.U.) and a research assistant each reviewed five full-text articles and, through an iterative process, clarified the inclusion/exclusion criteria. The team decided to only include full-text articles of completed research and to exclude commentaries, editorials, and consensus conference proceedings. Articles also had to have an education focus. An updated literature search was conducted in June 2016.
Step 3: Charting the data
A first version of a standardized data extraction form was developed based on the literature.1,6,7,18 Several team members (S.H.M., L.V., T.J.W., C.G., L.U.) and a research assistant each reviewed five different articles for a total of 30 articles and piloted the use of the extraction form. The entire team met and, through an iterative process, refined the definitions of each type of data to be extracted and created a second version of the extraction form. At this stage, they also clarified and finalized the inclusion/exclusion criteria. Thirteen articles were then reviewed by two team members using the second version of the extraction form. Another team meeting was held during which complete agreement was achieved on all data extraction items, except on how to define the number of rounds reported. This disagreement highlighted the extensive variations in consensus group methods, and the decision was made to count rounds in the “true consensus group” portion of the study. For example, consensus methods were often used in combination with other methods, such as informal meetings or focus groups. We therefore only counted the number of rounds for the consensus method itself. In this meeting, the team completed the third and final version of the data extraction form.
The final extraction form consisted of two parts. The first part gathered article demographic information, such as the consensus group method used, type of journal, purpose of the study, specialty involved, level of training targeted, and area of focus (national or international). The second part of the extraction form included items that would indicate the rigor of the study. The following questions were included: (1) Was a literature review conducted? (2) Was background information provided to the participants? (3) Was the consensus method used for item generation, ranking, or both? (4) How many participants were included? (5) Was mail/e-mail polling or face-to-face questioning used? (6) Were private decisions collected? (7) Was formal feedback provided? If so, was the feedback described? (8) How many rounds were conducted? (9) Was the number of rounds determined a priori? (10) Was there a predetermined definition of consensus? If so, what was it? (11) Was consensus forced? For each of these questions, the reviewers looked not only for the presence or absence of these items and whether the questions were addressed but also whether they were explicitly stated and described in sufficient detail.
Using this data extraction tool, each team member completed a full review of 25 to 30 articles from the first search (2009–2014). If there were any ambiguous items, the article was reviewed by the principal investigator (S.H.M.), who made the final decision. The remainder of the articles were reviewed by one team member (S.H.M., L.V., T.J.W., T.F., C.G.).
After the updated search in June 2016, eight articles were each assigned to two different team members for review. Two team meetings were held to discuss these articles. The remaining articles were each coded by one team member (S.H.M., K.M., T.J.W., C.G., T.F., C.W.). Any ambiguous items were discussed, and consensus was achieved. At this stage, we specifically focused on medical education (the education of physicians at any level) and excluded articles dealing with nursing, dentistry, veterinary, or allied health education, as an article on using consensus methods in nursing education had been published recently.14
Step 4: Collating, summarizing, and reporting results
We followed the three distinct data analysis steps suggested by Levac and colleagues.20 First, we analyzed the data using a quantitative or numerical approach (computed using SPSS Statistics 24, IBM, Armonk, New York). We report these results in table form. The entire team then reviewed the numerical summary to apply meaning to the results. The expertise of the team provided several viewpoints for considering the meaning of the results—a quantitative researcher with a background in measurement, a qualitative researcher with extensive experience in qualitative methods, a researcher with a PhD in nursing, and several clinician educators actively involved in medical education research. While scoping review methods traditionally include a thematic analysis of the research findings using qualitative content analysis, our study’s primary objective was to analyze the methods used and reported in the articles and not the findings of the articles. As a result, we did not conduct a thematic analysis of the reported findings.
Our literature search results and article selection process are illustrated in Figure 1. After a review of 334 full-text articles, 257 articles were included in our study. See Supplemental Digital Appendix 1 at https://links.lww.com/ACADMED/A461 for a complete bibliography of these articles.
Table 2 lists the demographic information extracted from the 257 articles included in our review. In these articles, the authors used a wide variety of terms to describe the consensus method they employed. The Modified Delphi method (105/257; 40.8%), Delphi method (91/257; 35.4%), and NGT (23/257; 8.9%) were the most commonly named consensus methods. However, many articles used a combination of methods. Most of the articles (148/257; 57.6%) were published in medical journals (e.g., Surgery), but more than a third (88/257; 34.2%) were published in medical education journals (e.g., Academic Medicine). The most common purpose for using a consensus group method was new curriculum development or reform (68/257; 26.5%), assessment tool development (55/257; 21.4%), and defining competencies (43/257; 16.7%). Consensus group methods were used to address local (72/257; 28.0%), national (104/257; 40.5%), and international (56/257; 21.8%) issues. In 25 articles (9.7%), it was unclear whether the focus was local, national, or international. Studies were conducted in a wide variety of specialties (see Table 2).
Standardization in the application of consensus methods
We present the full list of approaches used in Table 3. Below, we highlight some particularly noteworthy findings.
Literature review, background information for participants, study purpose.
Of the 257 articles we reviewed, 180 (70.0%) described a literature review being conducted in preparation for the questionnaire. Only 70 (27.2%) articles described what background information was provided to the participants at the beginning of the study. The purpose of the consensus method was stated in the vast majority of articles (230/257; 89.5%), with 55 of those (23.9%) indicating it was for idea generation, 81 (35.2%) for ranking, and 94 (40.9%) for both idea generation and ranking.
The types of participants who made up the consensus groups (see Table 2) were primarily physicians (101/257; 39.3%), followed by interprofessional groups (34/257; 13.2%) such as physicians and allied health professionals, and groups of experts (34/257; 13.2%). We noted that the type of expertise required of participants was usually not clearly described. Table 3 lists the number of articles that reported the number of participants and response rates across rounds of consensus development. Of the 257 articles we reviewed, 215 (83.7%) listed the number of participants invited at the beginning of the study. However, the number of studies listing the response rate for the first round dropped to 170 (66.1%). Furthermore, only 129 (50.2%) articles reported the number of participants in the second round of consensus development.
Consensus method features (polling, anonymity, feedback, number of rounds).
Polling, or group interaction, was described in 180 (70.0%) of the 257 articles we reviewed. It was completed by mail or e-mail (117/180; 65.0%), face-to-face (34/180; 18.9%), or through a combination of these two approaches (29/180; 16.1%). Anonymity, or the collection of private decisions, was described in 103 (40.1%) articles; the remaining 154 (59.9%) articles did not describe or did not collect private decisions. We reviewed approaches to feedback and participants’ ability to change their rankings or add items. Only 97 (37.7%) articles noted that formal feedback of group ratings was shared with participants.
Because iteration is considered a defining feature of consensus methods, 2 or more rounds of data collection are required. The number of rounds in each study varied, with a range of 0 to 14. In total, only 197 (76.7%) of the 257 articles we reviewed described having 2 or more rounds. The remaining 60 (23.3%) declared using a consensus method but had only a single data collection session. Of the studies that described 2 or more rounds of data collection, most had 2 rounds (111/257; 43.2%), although several had 3 rounds (66/257; 25.7%) or 4 rounds (13/257; 5.1%). The remainder had 5 rounds (4/257), 7 rounds (2/257), and 14 rounds (1/257). In 47 (18.3%) articles, the number of rounds was predetermined, but in 210 (81.7%) articles, the number of rounds was either not predetermined or not described.
In 111 (43.2%) of the 257 articles we reviewed, the definition of consensus was predetermined. When the definition of consensus was provided, there was a wide range of accounts. In some articles, consensus was defined as more than 20% agreement, and in others as 90% to 100% agreement. Of the articles we reviewed, 100 (38.9%) reported whether or not consensus was forced, with 27 (10.5%) indicating that it was and 73 (28.4%) indicating that it was not. In 157 (61.1%) articles, whether consensus was forced or not was not described.
Our study is unique in that it set out to explore consensus group methods in medical education research, which had not been done before. We found a substantial number of articles (n = 257) published between 2009 and 2016 that used consensus methods. The Delphi and Modified Delphi methods accounted for just over 75% of the consensus methods reported in these articles. In addition to the 40.8% of articles that used the term Modified Delphi, many others described using a Delphi method in combination with another consensus group method or using a Modified Delphi method with another method. Some articles also used both the Delphi and Modified Delphi terms to describe their method, often using different terms in the abstract and method sections of the same article.21–24 Notably, approximately one-third of articles did not describe using iteration in their data collection; in other words, in these studies, a single survey was sent to participants. We argue that such a study design should not be considered a consensus group method, as iteration is a key feature of consensus methods.16 Consistent with other research, our findings also indicate that the consensus methods terminology used in the medical education literature varies greatly.6
Nearly 40% of the articles we reviewed included only physicians, although several included groups from across the health professions. We did not assess the appropriateness of these choices. However, what was concerning was our inability to determine who the participants were in several articles.
Our findings suggest that the reporting quality and standardization of consensus methods in medical education research varies greatly. The following areas appeared particularly problematic and were often left out or poorly described in the articles we reviewed: conducting a literature review to inform the consensus method; providing background information to participants; reporting the number of participants after each round; describing the level of anonymity used in the study; providing participants with feedback of group ratings; and articulating the definition of consensus used in the study.
Approximately 70% of the articles we reviewed reported completing a literature review. While the other 30% may have engaged in a literature review without describing it, we suggest that this important step must be described. Similar findings have been reported previously.10,14
Equally important is the provision of background information to participants at the beginning of the consensus-building process. This step may be less relevant if the participants are truly experts; still, background information may influence participants, so a clear description of what information was provided and in what format is important.2,11 Only 27.2% of the articles we reviewed described this information.
We also examined the reporting of response rates for each round of data collection. Only 83.7% of the articles we analyzed reported the number of participants invited at the beginning of the study, 66.1% reported the number of participants for round 1 of data collection, and 50.2% reported the number of participants for round 2. Other analyses of consensus methods research found similar poor reporting of this feature, with 7% to 39% of studies reporting response rates for all rounds of data collection.10,11 We believe that this information could affect any interpretation of the study results; thus, it warrants inclusion.
Anonymity or private decision making is also considered essential for high-quality consensus group methods. Unfortunately, fewer than half of the articles we vetted explicitly reported that participants’ anonymity was maintained or provided enough information in the method section to make this clear. While the authors may have assumed that readers would understand that anonymity was part of their study design, we suggest that they state this, given the variability in approaches that have been labeled as modified consensus methods. Another important feature of consensus group methods is feedback to participants, including statistical group responses and qualitative information. This information allows participants to re-rank items based on the responses of others. We found that feedback was reported in approximately one-third of the articles we reviewed. Other researchers also have highlighted that anonymity and feedback to participants are not often reported in descriptions of consensus methods.10,11,16
Finally, the most concerning issue we identified was that consensus was often not defined a priori. Only 43.2% of the articles we reviewed reported their definition of consensus at the start of the study. This issue has been highlighted in other analyses of consensus methods, with only 28% to 52% reporting their definition of consensus.10,13,14
It was beyond the scope of this review, but it is still noteworthy that minimal research on the consensus methods themselves has been done. An article on the theoretical underpinning for group decision making was recently published, and previous work has focused on factors affecting the outcomes of consensus group methods.25,26
Our study has several limitations. First, we did not search all electronic databases and years, so it is possible that some studies may not have been included in our study. Also, we modified scoping review methods for our analysis; however, we believe that our approach yielded a defensible answer to our research questions.27
Our findings regarding consensus group methods in the medical education literature echo results from other disciplines and highlight the considerable variability across studies. Medical education studies do not consistently provide sufficient detail regarding the methods used, leading to a lack of scientific credibility. If consensus methods are to inform best practices, they must be rigorously conducted. The lack of consensus on consensus methods makes it imperative that researchers provide clear and detailed reporting of the methods they used and that they justify these choices.
To that end, the findings from this review already have informed the development of consensus group methods recommendations.28 Ultimately, however, more research is required on the methods themselves, including on the theoretical underpinnings and on how variations in the consensus process affect consensus outcomes.
The authors would like to thank the Department of Innovation in Medical Education at the University of Ottawa for ongoing research assistant support. Drs. Gonsalves and Humphrey-Murto acknowledge salary support from the Department of Medicine at the University of Ottawa. The authors also would like to thank Dr. Aliki Thomas for feedback on scoping review methods.
1. Jones J, Hunter D. Consensus methods for medical and health services research. BMJ. 1995;311:376380.
2. Murphy MK, Black NA, Lamping DL, et al. Consensus development methods, and their use in clinical guideline development. Health Technol Assess. 1998;2:iiv, 1.
3. Tammela O. Applications of consensus methods in the improvement of care of paediatric patients: A step forward from a “good guess.” Acta Paediatr. 2013;102:111115.
4. Fitch K, Bernstein SJ, Aguilar MD, et al. The RAND/UCLA Appropriateness Method User’s Manual. 2001. Santa Monica, CA: RAND Corporation; http://www.rand.org/pubs/monograph_reports/MR1269.html
. Accessed April 24, 2017.
5. Crisp J, Pelletier D, Duffield C, Adams A, Nagy S. The Delphi method? Nurs Res. 1997;46:116118.
6. Hasson F, Keeney S, McKenna H. Research guidelines for the Delphi survey technique. J Adv Nurs. 2000;32:10081015.
7. Campbell SM, Cantrill JA. Consensus methods in prescribing research. J Clin Pharm Ther. 2001;26:514.
8. Vernon W. The Delphi technique: A review. Int J Ther Rehabil. 2009;16:6976.
9. Goodman CM. The Delphi technique: A critique. J Adv Nurs. 1987;12:729734.
10. Boulkedid R, Abdoul H, Loustau M, Sibony O, Alberti C. Using and reporting the Delphi method for selecting healthcare quality indicators: A systematic review. PLoS One. 2011;6:e20476.
11. Sinha IP, Smyth RL, Williamson PR. Using the Delphi technique to determine which outcomes to measure in clinical trials: Recommendations for the future based on a systematic review of existing studies. PLoS Med. 2011;8:e1000393.
12. Keeney S, Hasson F, McKenna HP. A critical review of the Delphi technique as a research methodology for nursing. Int J Nurs Stud. 2001;38:195200.
13. Diamond IR, Grant RC, Feldman BM, et al. Defining consensus: A systematic review recommends methodologic criteria for reporting of Delphi studies. J Clin Epidemiol. 2014;67:401409.
14. Foth T, Efstathiou N, Vanderspank-Wright B, et al. The use of Delphi and nominal group technique in nursing education: A review. Int J Nurs Stud. 2016;60:112120.
15. Waggoner J, Carline JD, Durning SJ. Is there a consensus on consensus methodology? Descriptions and recommendations for future consensus research. Acad Med. 2016;91:663668.
16. de Loë RC, Melnychuk N, Murray D, Plummer R. Advancing the state of policy Delphi practice: A systematic review evaluating methodological evolution, innovation, and opportunities. Technol Forecast Soc. 2016;104:7888.
17. Humphrey-Murto S, Varpio L, Wood TJ, Gonsalves C, Ufholz L-A, Foth T. The use of the Delphi and other consensus group methods in medical education. Acad Med. 2016;91:S11.
18. de Villiers MR, de Villiers PJ, Kent AP. The Delphi technique in health sciences education research. Med Teach. 2005;27:639643.
19. Arksey H, O’Malley L. Scoping studies: Towards a methodological framework. Int J Soc Res Meth. 2005;8:1932.
20. Levac D, Colquhoun H, O’Brien KK. Scoping studies: Advancing the methodology. Implement Sci. 2010;5:69.
21. Maagaard M, Oestergaard J, Johansen M, et al. Vacuum extraction: Development and test of a procedure-specific rating scale. Acta Obstet Gynecol Scand. 2012;91:14531459.
22. Koehler RJ, Amsdell S, Arendt EA, et al. The Arthroscopic Surgical Skill Evaluation Tool (ASSET). Am J Sports Med. 2013;41:12291237.
23. Aminian G, O’Toole JM. Undergraduate prosthetics and orthotics programme objectives: A baseline for international comparison and curricular development. Prosthet Orthot Int. 2011;35:445450.
24. Carr SE, Celenza A, Lake F. Designing and implementing a skills program using a clinically integrated, multi-professional approach: Using evaluation to drive curriculum change. Med Educ Online. 2009;14:14.
25. Hauer KE, Cate OT, Boscardin CK, et al. Ensuring resident competence: A narrative review of the literature on group decision making to inform the work of clinical competency committees. J Grad Med Educ. 2016;8:156164.
26. Hutchings A, Raine R. A systematic review of factors affecting the judgments produced by formal consensus development methods in health care. J Health Serv Res Policy. 2006;11:172179.
27. Cook DA. Tips for a great review article: Crossing methodological boundaries. Med Educ. 2016;50:384387.
28. Humphrey-Murto S, Varpio L, Gonsalves C, Wood TJ. Using consensus group methods such as Delphi and nominal group in medical education research. Med Teach. 2017;39:1419.