Creating a Peer Review Process for Faculty-Developed Next Generation NCLEX Items : Nurse Educator

Journal Logo

Articles

Creating a Peer Review Process for Faculty-Developed Next Generation NCLEX Items

Hensel, Desirée PhD, RN, PCNS-BC, CNE, CHSE; Billings, Diane M. EdD, RN, FAAN, ANEF

Author Information
Nurse Educator 48(2):p 65-70, March/April 2023. | DOI: 10.1097/NNE.0000000000001322

Abstract

With changes to the National Council of State Boards of Nursing (NCSBN) National Council Licensure Examination (NCLEX), nursing faculty are seeking best methods to prepare students for the Next Generation NCLEX (NGN).1,2 The licensing examination will remain a legally defensible examination characterized by development practices that align items to a test plan, use standardized item writing procedures, include layers of review, require pretesting, and involve psychometric analysis.2–5 Potential questions used in this examination go through item, editorial, sensitivity, differential, and nursing regulatory reviews before becoming operational items.4,5

The NCLEX item writers are registered nurse and practical nurse faculty teaching clinical nursing content. The NCLEX test item review is a modification of a peer review process where other nurses, who provide direct patient care in a clinical setting and work with new nurses, determine whether items are suitable to test entry-level practice, are current, and are accurate.4,5 Sensitivity and differential reviews look for unintentional bias such as stereotypes and linguistic issues, which may affect measurement.5 Editorial reviews examine grammar, clarity, spelling, and punctuation.5 The nursing regulatory review focuses primarily on the appropriateness of an item to test entry-level practice.5

Peer Review

Peer review has long been the standard for scholarly writing and thought to be empowering and leading to professional growth.6 Peer review involves the evaluation of work (teaching, research, test item writing) by people with expertise similar to the person or product being reviewed. Peer review is considered a best practice when writing test items,7 but the practice is underutilized by nursing faculty.8,9

Nursing faculty, who are writing their own NGN items or choosing questions to include on classroom tests from vendor test banks, may not be able to undertake the same extensive review process used by the NCSBN. However, it is possible to follow item writing procedures similar to those used by the NCSBN and conduct peer review. The purpose of this article is to report the findings from a pilot study of a peer review process used with a group of faculty who were writing NGN items for the Maryland NextGen Test Bank Project.

Methods

As part of a grant from the Maryland Higher Education Commission Nurse Support Program, the Maryland Nursing Workforce Center sponsored the Maryland NextGen Test Bank Project to develop a state-wide pool of NGN items to use for teaching clinical judgment and formative assessment. The items created in this project were accessible through a nonsecure website and thus not designed to be used on graded course examinations. Faculty, referred to as champions, were recommended by their deans to write items for the project and to work with other faculty at their respective schools to develop item writing skills. The project was reviewed by the sponsor's institutional review board and was considered exempt.

Item writing training took place in two 2-hour, web-based sessions. The champions selected 2 topics to develop as case studies from a balanced list of options representing the 10 categories of health alterations on the NCLEX program reports.10 Champions were also instructed to develop a related stand-alone bowtie or trend item for each topic. A template, developed for the project, was used to write all items. The template synthesized item writing information from the NCSBN,2–4,11 with information from other sources12–14 to create a form that included objectives, references, rationales, an electronic medical record (EMR) table with common laboratory values, and NGN style options for questions. Each clinical judgment step included 3 to 5 options for creating an NGN style drop-down, drag-and-drop, highlighting, matrix, multiple-response, or multiple-choice question. The template included options for writing bowtie and trend items.

A 5-hour, face-to-face, peer review session to review the case studies was held 4 weeks after the virtual item writing training and after champions wrote test questions that received several rounds of feedback from the project team. Anticipating peer reviews might take 60 minutes to complete, champions were assigned to review 3 cases, written by their peers, with topics matched as closely to their expertise as possible. A double-blinded review process was implemented where only the project leaders knew the identity of the reviewers and case authors.

Clinical Judgment Item Peer Review Form

Champions completed the peer reviews using the Clinical Judgment Item Peer Review Form created for the project (the Peer Review Form is available to readers in Supplemental Digital Content 1, available at: https://links.lww.com/NE/B219). The peer review form, designed to be used with case studies or stand-alone items, addressed several of the functions expected during NCLEX item and editorial reviews.4,5 The review form included 31 items rated for simplicity on a 3-point scale: Yes, Uncertain, or No. Seven items were related to the scenario and EMR construction. Twenty-four items addressed each of the NCSBN Clinical Judgment Measurement Model steps: recognize cues, analyze cues, prioritize hypotheses, generate solutions, take action, and evaluate outcomes.2 Each step was reviewed to determine whether the question tested the expected clinical judgment task, was formatted correctly, was accurate, and included a complete rationale. Each review item included the option to add comments. A final 5-point Likert scale question measured how likely it was that the reviewer or their program would be to use the case study to teach clinical judgment.

The review form incorporated a separate section with 7 questions for reviewing bowtie or trend items. Similar to the process for reviewing case studies, a rating was included of how likely the reviewer thought the stand-alone item might be used in their program. A final option was provided for any additional comments.

Preparing Faculty to Conduct Peer Review

The face-to-face session began with a 25-minute training on how to conduct the reviews and rate the item using the review form (see Supplemental Digital Content 2, available at: https://links.lww.com/NE/B220, Training Program). Champions, now acting as reviewers, were instructed to first read the entire case through from beginning to end and make notes about areas of concern. After an initial reading, the champions were to read the case again looking for details including whether the case study aligned with the stated summary and objectives; whether it was realistic, logical, and appropriate to test entry-level clinical judgment; and whether the EMR was complete and correctly formatted. Instructions included paying particular attention to the EMR length and use of abbreviations.

Next, the champions were instructed to read each question to determine whether the content tested the appropriate clinical judgment step, item was formatted correctly, content was accurate, and rationale was complete. Red flags were discussed for each review area such as having options for recognizing cues questions that could not be found in the EMR or generating solutions questions that presented interventions already documented as completed in the EMR. Champions were instructed to provide comments for any item that received a No or Uncertain rating. The final step in the review process was to rate how likely it was the case or stand-alone item would be used in the champion's program.

Final reviews were uploaded through a URL to an electronic survey software. Reviews were downloaded after the session and sent to the original authors to guide revisions. Summary data, including frequencies and percentages, were analyzed using IBM SPSS Statistics for Windows (Version 26.0, IBM Corp Armonk, New York). The analysis included determining how long the review process took; whether the reviewers could use the form to find format, content, and writing flaws; and whether they could use the form to provide useful feedback.

Results

Forty cases were submitted for review by 21 project champions. Thirty-five cases included a related stand-alone item (25 bowties and 10 trends). Eighteen champions from 13 different programs attended the face-to-face session. The attendees completed 55 reviews of the 40 cases and related stand-alone items. Twenty-five cases had a single review, and 15 cases had 2 reviews. The majority of reviews (N = 52) were done within the 3-hour time frame allotted at the workshop, suggesting that most reviews could be completed within an hour. The remaining reviews were submitted following the workshop.

Review Element Ratings

Rating frequencies were examined to determine which item development aspects were more challenging for writers. Table 1 shows that more than 80% of the time champion reviewers found the case studies written by their peers to be appropriate to test clinical judgment, logical, and realistic. However, the case study EMRs were rated as being complete and formatted correctly only 53% of the time.

Table 1. - Case Construction Item Ratings
Abbreviated Review Element N No
n (%)
Uncertain
n (%)
Yes
n (%)
Case summary 55 17 (39.9) 7 (12.7) 31 (56.4)
Objectives 54 11 (20) 10 (18.2) 33 (60)
Appropriate to test clinical judgment 55 2 (3.6) 5 (9.1) 48 (87.3)
Scenario realistic 55 2 (3.6) 3 (5.5) 50 (90.9)
Scenario unfolds logically 55 5 (9.1) 6 (10.9) 44 (80)
EMR complete and formatted correctly 54 18 (32.7) 7 (12.7) 29 (52.7)
References 54 8 (14.5) 1 (1.8) 45 (81.8)
Abbreviation: EMR, electronic medical record.

Table 2 shows the frequency of review ratings for each of the 6 clinical judgment steps. Positive ratings of Yes were given 69% to 85% of the time for testing the appropriate clinical judgment step. Positive scores were given more than 80% of the time for items testing recognizing cues, generating solutions, and taking action, suggesting that those steps were easiest to write. Prioritizing hypotheses questions received positive ratings only 69% of the time, suggesting that writing questions for that step was more difficult. Item formatting was a more challenging process for writers with positive ratings being given only 51% to 73% of the time. Other positive ratings ranged from 61% to 82% for accuracy and from 71% to 76% for rationale completeness.

Table 2. - Clinical Judgment Questions Ratings
Abbreviated Review Element N No
n (%)
Uncertain
n (%)
Yes
n (%)
Q1. Recognize cues
Q1 Tests recognizing cues 54 4 (7) 4 (7.4) 46 (85.2)
Q1 Correctly formatted with no writing flaws 55 20 (36.4) 7 (12.7) 28 (50.9)
Q1 Correct options are accurate 54 10 (18.5) 11 (20.4) 33 (61.1)
Q1 Rationale is complete 52 11 (21.2) 3 (5.8) 38 (73.1)
Q2. Analyze cues
Q2 Tests analyzing cues 54 5 (9.3) 8 (14.5) 41 (75.9)
Q2 Correctly formatted with no writing flaws 53 13 (24.5) 3 (5.7) 37 (69.8)
Q2 Correct options are accurate 54 13 (24.1) 4 (7.4) 37 (68.5)
Q2 Rationale is complete 54 9 (16.4) 4 (7.4) 41 (75.9)
Q3. Prioritize hypotheses
Q3 Tests prioritizing hypotheses 55 11 (20) 6 (10.9) 38 (69.1)
Q3 Correctly formatted with no writing flaws 53 9 (17) 6 (11.3) 38 (71.7)
Q3 Correct options are accurate 53 9 (17) 7 (13.2) 37 (69.8)
Q3 Rationale is complete 52 9 (17.3) 4 (7.7) 39 (75)
Q4. Generate solution
Q4 Tests generating solutions 55 9 (16.4) 2 (3.65) 44 (80)
Q4 Correctly formatted with no writing flaws 55 12 (21.8) 8 (14.5) 35 (63.6)
Q4 Correct options are accurate 54 10 (18.5) 8 (14.8) 36 (67.7)
Q4 Rationale is complete 55 12 (21.8) 4 (7.3) 39 (70.9)
Q5. Take action
Q5 Tests taking action 53 5 (9.4) 5 (9.4) 43 (81.1)
Q5 Correctly formatted with no writing flaws 53 15 (28.3) 2 (3.8) 36 (67.9)
Q5 Correct options are accurate 54 9 (16.7) 7 (12.7) 38 (70.4)
Q5 Rationale is complete 54 10 (18.5) 5 (9.1) 39 (72.2)
Q6. Evaluate outcomes
Q6 Tests evaluating outcomes 55 7 (12.7) 6 (10.9) 42 (76.4)
Q6 Correctly formatted with no writing flaws 55 11 (20) 4 (7.3) 40 (72.7)
Q6 Correct options are accurate 55 4 (7.3) 6 (10.9) 45 (81.8)
Q6 Rationale is complete 55 12 (21.8) 4 (7.3) 39 (70.9)

Positive ratings ranged on the stand-alone items from 61% to 88% for the bowties and from 57% to 100% for the trends. Lowest ratings for both items were given for completeness of the rationale. For the trend items, there was 100% agreement that the items did in fact include time-stamped data. Positive ratings for EMR completeness and formatting were given approximately 79% of the time for both stand-alone item types.

Reviewer Comments for the Item Writers

Comments were examined to determine whether the champion reviewers followed instructions for giving feedback. Most comments provided for items with Uncertain or No ratings were civil and specific. Helpful comments typically addressed the item's ability to test clinical judgment, content errors, and suggestions for using plausible distractors as well as formatting issues. Comments showed that the reviewers found areas of inconsistency, “Neuromuscular checks are ordered every 15 minutes, but EMR documents every 30 minutes.” They made suggestions for more plausible options, “I would change seizure disorder to something more related to cardiac. Maybe atrial fibrillation,” and ways to increase judgment, “Instead of telling the test takers that the client presents with stroke, could you just have the client present with s/s and not have the stroke diagnosis?” Comments also showed that reviewers were paying close attention to question accuracy, “fasting glucose and A1C are not used to monitor gestational diabetes.”

Overall Review Process Findings

The Figure shows that the champion reviewers thought that they personally or their programs were likely (30%) or very likely (58%) to use most of the reviewed items in their programs. Comments reflected that being overly specific, being unrealistic, or reflecting inaccurate care management was associated with very unlikely or unlikely ratings.

F1
Figure.:
Likelihood of using item in program.

Formative feedback from the reviewers to the project team during the session indicated that the review process was the most helpful part of learning to write NGN items. Anecdotal evidence showed that champions were able to use information from the training and the feedback from the reviews to improve their case studies and stand-alone items. This was exemplified in a follow-up email reflecting on the review process, “This has been such an eye-opening experience, and I really enjoyed the peer review process. I've considered the feedback from the reviews you sent and incorporated most of the feedback into these final copies.”

Discussion

Usability of the Peer Review Form and Process

Although items used on NCLEX go through extensive review before they are used in this high-stakes examination,4,5 peer review of test items is one of the least used best test construction practices.8,9 Lack of policies requiring peer review may be one factor.8,15,16 Experience indicates that other barriers may include time pressures, fear of criticism, and lack of training. We found with a short training session, the champions were able to use the review form to conduct timely and helpful peer reviews of a wide variety of NGN items. Entering data into an electronic system facilitated retrieval and helped protect reviewer and author identity, which may decrease fears of criticism. Champions came from 13 different schools. Thus, we conclude that it is feasible to implement a peer review process and this form, or a modified version, at other schools of nursing.

Currently, faculty rely heavily on the use of multiple-choice questions in course examinations.17 There are at least 3 new skills faculty must learn to write NGN case studies: developing a scenario with an EMR, writing items that address the 6 clinical judgment steps in order, and constructing questions using new item formats.2,16 Although champions attended an item writing session and used a template with item stems, peer reviews showed that approximately 22% of the time, writers failed to clearly address the appropriate clinical judgment step; approximately 34% of items had formatting issues; and approximately 47% of the case study EMRs had completion or formatting problems. Based on these findings, revisions were made to the project template and writing instructions. Updated materials were shared with the champions to help train faculty at their institutions.

A recent study found that lack of item writing policies was common in nursing programs, and most nursing faculty felt unprepared to write NGN items.15 Lack of faculty preparedness to write NGN questions has significant implications for students if low-quality questions are used on summative examinations. We advocate that developing policies and procedures for peer review for NGN items is essential to help ensure that questions are appropriate and fair to test entry-level nursing judgment. Such policies should also include conducting an item analysis for any item used for summative testing to help ensure that the items are appropriately difficult.16,18

All NGN items will use an EMR to convey clinical data necessary for decision making.2 Although developing an EMR is a critical aspect for writing NGN cases, some of the lowest review scores were related to developing the EMR. The EMR in an NGN case study unfolds, adding but not subtracting data, while the EMR for stand-alone items is stagnant.2 Case studies with 6 items will take test takers approximately 10 minutes to complete.19 For this project, champions were given guidelines about building the EMR from the template, the maximum length, and use of abbreviations. Still comments reflected that several case study EMRs were too complex, did not unfold logically, and used abbreviations that may have not been universally taught across programs. Comments and ratings suggest that building an EMR for stand-alone items was an easier process. We conclude that training faculty on how to develop an EMR is just as important in helping faculty learn to write NGN questions as understanding the new item types.

Writing NGN items is time consuming, and in this project, champions were encouraged to consider item writing as a scholarly effort. The champions also were trained to work with the faculty at their programs to improve their writing skills. Peer review, a foundation for scholarly work, can be seen as an important developmental process that benefits the item writer and the reviewer.6 However, incivility may prevent achieving desired peer review goals.6 As a prevention strategy, the project leaders shared their personal experiences with peer review and encouraged reviewers to be constructive and specific. Analysis of the comments showed that they were civil and included actionable feedback. Review of the revised cases submitted after the face-to-face session showed that the NextGen champions were able to use the knowledge gained from the process and actual reviews received to strengthen their writing skills and improve the items they wrote.

Practice Implications for Creating a Peer Review Process

By using a peer review form, such as the one developed for this project, faculty can institute their own peer review process for each course that includes examinations with NGN items that are prepared by faculty or provided by publishers. The first step in conducting a peer review of test items is to establish a process that will work for a given program and that will ensure a safe environment for giving and receiving reviews. Peer review that is blinded (the item writer and the reviewer are not known to each other) is ideal. Blind peer review may not be practical in small faculty groups, but they may be able to designate a person to coordinate the review by distributing the review instrument and items to be reviewed, collecting the reviews, and then forwarding them to the item writer. The next step is to identify potential peer reviewers. Potential reviewers are faculty with content and item writing expertise who are teaching the course in which the examination item will be used. Clinical faculty or adjunct faculty who are teaching clinical courses and understand course content and the clinical judgment process tested on the licensing examination can be particularly helpful in ensuring content accuracy and that each step of the clinical judgment process is followed. Using a review form and orienting faculty to the peer review process guide faculty in completing helpful reviews. The peer review process concludes when the item writer receives the reviews and revises the examination item.

Once the items have been revised and reviewed by the course faculty, faculty should ensure that the items align with the specific course-learning outcomes and content tested by the item and that students have had an opportunity to learn the content in the context of a clinical judgment framework. Ideally, these items would be first tested by students using the items for practice and undergo an item analysis before they are included on examinations that will be used to assign course grades. If item analysis is not available for NextGen questions, faculty can consider using information that can be obtained from scoring a practice test.

Faculty should develop or revise protocols or policies for including NGN test items on their faculty-developed tests. Items that are used on tests that are graded should follow best practices for developing and using examinations: using test questions that align with course outcomes and content that has been taught in class; using test questions that are valid (test common nursing practice) and reliable (test consistently); using NGN test items with students in “practice mode” before using on graded tests; using a test blueprint and sharing it with students; and conducting a postexamination review with students in which faculty spend sufficient time giving students feedback on how to answer NGN questions.

Limitations

The NGN items in this project were created from a template that all champions used. Beginning with this common knowledge most likely facilitated the review process. The group's clinical expertise was not evenly distributed, and some champions were asked to review cases that were not directly related to their expertise. This may have affected the quality of some reviews. Champions used the same tool to review different cases, but there were not enough champions with similar expertise and reviewing the same cases to perform psychometric analysis of the peer review tool such as interrater reliability. However, our personal experiences with peer reviews, both received and conducted, indicate that while reviews may have common themes, they are never identical. Different reviewer perspectives tend to make the review process richer. Future research is needed on optimal faculty development and the role of peer review in writing NGN items.

Conclusions

The purpose of using NGN items in teaching and testing is to provide students an opportunity to practice using the 3 types of questions (case studies, bowties, and trends) and to test the student's ability to make clinical judgments. Faculty must use valid and reliable items if they are using NextGen questions on their course tests, and peer review is one essential way to improve examination validity. Peer review can also improve the quality of questions used for teaching and formative assessment, which has the potential to improve learning. We urge faculty to establish a peer review process similar to the one tested in this study at their school and to follow best test item development and administration practices when using NGN items on tests that will result in a grade.

References

1. Poorman SG, Mastorovich ML. Constructing next generation national council licensure examination (NCLEX)(NGN) style questions: help for faculty. Teach Learn Nurs. 2020;15(1):86–91. doi:10.1016/j.teln.2019.08.008
2. National Council of State Boards of Nursing (NCSBN). Next generation NCLEX news. Accessed July 3, 2022. https://www.ncsbn.org/ngn-resources.htm
3. NCSBN. Test plans. Published 2020. Accessed July 3, 2022. https://www.ncsbn.org/testplans.htm
4. Betts J, Muntean W, Kim D, Jorion N, Dickison P. Building a method for writing clinical judgment items for entry-level nursing examinations. J Applied Testing Technology. 2019;20(S2):21–36.
5. Brenton A. The NCLEX® item development process. NCLEX virtual conference. 2022.
6. Trotter TL. Using the peer review process to educate and empower emerging nurse scholars. J Prof Nurs. 2021;37(2):488–492. doi:10.1016/j.profnurs.2020.10.009
7. Tarrant M, Ware J. A framework for improving the quality of multiple-choice assessments. Nurse Educ. 2012;37(3):98–104. doi:10.1097/NNE.0b013e31825041d0
8. Bristol TJ, Nelson JW, Sherrill KJ, Wangerin VS. Current state of test development, administration, and analysis: a study of faculty practices. Nurse Educ. 2018;43(2):68–72. doi:10.1097/NNE.0000000000000425
9. Killingsworth E, Kimble LP, Sudia T. What goes into a decision? How nursing faculty decide which best practices to use for classroom testing. Nurs Educ Perspect. 2015;36(4):220–225. doi:10.5480/14-1492
10. Mountain Measurement Inc. NCLEX® program reports. Published 2022. Accessed July 3, 2022. https://transom.mountainmeasurement.com/nclex2/reports/about
11. NCSBN. Publisher's summit. 2020.
12. NBME. Item writing guide. Published 2021. Accessed July 3, 2022. https://www.nbme.org/item-writing-guide
13. National League for Nursing. Simulation innovation resource center. Published 2019. Accessed July 3, 2022. https://www.nln.org/education/education/sirc/sirc/sirc
14. Wolters Kluwer. Lippincott advisor for education. Published 2019. Accessed July 3, 2022. https://www.wolterskluwer.com/en/solutions/lippincott-solutions/lippincott-advisor
15. Moran V, Wade H, Moore L, Israel H, Bultas M. Preparedness to write items for nursing education examinations: a national survey of nurse educators. Nurse Educ. 2022;47(2):63–68. doi:10.1097/NNE.0000000000001102
16. Hensel D. Fair testing and incorporating next generation NCLEX items into course examinations. Nurse Educ. 2022:352–353. doi:10.1097/NNE.0000000000001288
17. Birkhead S, Kelman G, Zittel B, Jatulis L. The prevalence of multiple-choice testing in registered nurse licensure-qualifying nursing education programs in New York State. Nurs Educ Perspect. 2018;39(3):139–144. doi:10.1097/01.NEP.0000000000000280
18. Hensel D, Cifrino S. Item analysis and next-generation NCLEX. Nur Educ. 2022;47(5):308–310. doi:10.1097/NNE.0000000000001223
19. Schwartz J. Preparing for the next generation national council licensure examination. Published 2022. Accessed July 10, 2022. https://www.ncsbn.org/16708.htm
Keywords:

faculty development; Next Generation NCLEX; nursing education; peer review; test item writing

Supplemental Digital Content

© 2022 Wolters Kluwer Health, Inc. All rights reserved.