Modern frameworks for assessment design stipulate that the valid interpretation and use of test scores depend on educators being able to connect the dots between the tasks examinees perform on a test and the inferences and decisions educators make based on the results of that test. The closer the connection is between the task and the assertions being made about examinees’ ability, the stronger the argument is that the test provides evidence of the valid representation of that ability. Knowledge is increasingly available at physicians’ fingertips. As such, if a test is intended to measure what is valued in medical practice and if what is valued is the application of that knowledge during essential processes, testing tasks should include those processes a physician will be expected to perform rather than the recall of knowledge.
To reflect these priorities, some credentialing and licensure bodies create authentic clinical vignettes that address diagnostic or therapeutic reasoning across the breadth of content areas to which the credential or license is applicable. The process of generating and then iteratively reviewing, revising, and pilot testing these vignettes is time-consuming, expensive, and not guaranteed to address all of the tasks and conditions making up a specific medical domain. For medical schools and residency programs, which often lack the resources to produce an adequate pool of complex test items, the challenge is even more daunting. In all venues of medical education and assessment, the practice of assessing a physician’s understanding of clinical processes must become more efficient and effective.
The breadth of important tasks and conditions that make up any medical domain being assessed is vast, and using a typical one-at-a-time approach to test item writing without fully establishing the range of the domain leads to inevitable gaps in what is being assessed. If, instead of this one-at-a-time approach, we had a repository of structures representing the full range of content deemed important to assess, we could draw items from that collection as needed. This approach would be more efficient and result in tests that are more representative of the essential roles of the physician. In this article, we combine clinical reasoning and principled assessment design to introduce an approach called clinical process modeling, which we argue will improve the process for developing clinical reasoning assessments.
About Clinical Reasoning
Clinical reasoning involves the understanding and application of patient data; it includes the steps physicians take toward making decisions as well as the decisions themselves (e.g., diagnosis and therapy).1 As such, clinical reasoning is central to what physicians do in practice. From the Institute of Medicine report2 on improving diagnosis, we know that physicians, educators, accreditation bodies, and policymakers all continue to grapple with incorporating clinical reasoning into medical practice, education, and assessment. Furthermore, these groups also struggle with establishing best practices for teaching clinical reasoning.3 These challenges emerge, in part, because of the inherent difficulty in observing examinees’ underlying thought processes. For instance, even attempts at observation through concurrent verbal reports can suffer from bias as the process of thinking aloud has the potential to change a physician’s thoughts while performing a task.
Clinical reasoning research suggests that developing expert performance, whether formally or experientially, involves forming scripts that detail the range of symptoms and findings a physician should expect for a given condition; inherent in such scripts are the essential steps for arriving at a diagnosis.4 At any given step, the physician should identify the information that is necessary for making a good decision, properly weigh the facts, then make the appropriate decision. For many conditions (e.g., syncope, dysphagia), flowcharts following a patient from initial presentation to diagnosis via the clinical reasoning process aid in making decisions. These same flowcharts can help assess a physician’s ability to apply the analytic reasoning involved in making those decisions.
About Principled Assessment Design
Best practices in test development include the application of principled assessment design. Although there are several specific approaches to principled assessment design (e.g., evidence-centered design5 and assessment engineering [AE]6), at their core, they all share a common framework. Each approach includes steps for defining the object of measurement, identifying the types of evidence that would support drawing conclusions about the examinee’s ability in relation to that object, and applying the tasks for procuring that evidence. While any test, regardless of how it is produced, measures something and includes the tasks that are performed ostensibly to provide evidence toward that measurement, principled assessment design ensures that these parts, along with their relationships, are thoroughly and thoughtfully established before item development and test assembly.
AE6 is one approach to principled assessment design. As its name suggests, AE treats test development as an engineering process and is a highly structured and formalized approach to the design and implementation of cognitive assessment. Within the AE framework, assessment design begins with producing the claims one wishes to make about examinees based on their test performance. For instance, a test of general cardiology might indicate that an examinee with a passing score is a proficient general cardiologist. But that assertion is dependent on a thorough definition of what a proficient general cardiologist is. That definition would depend on many more precise claims, such as “a proficient general cardiologist should be able to differentiate between common types of myocardial infarction” and “a proficient general cardiologist should be able to provide appropriate therapy for typical presentations of congestive heart failure.” Common types and typical presentations of conditions may require enumeration, and for a construct as complex as a medical specialization, a good deal of iteration and refinement may be required to precisely define the construct claims.
Cognitive task models6 are then developed to provide evidence for each proficiency claim. These models serve as specifications for the test items that will eventually follow, identifying and integrating the essential components that affect the cognitive complexity of the underlying tasks, including the declarative knowledge components, the relationships between those components, the required cognitive skills, and the relevant content and contexts. With the essential components identified, moving from the cognitive task model to the test item is done via task templates, which specify how an item will be presented, what content can be manipulated, and how the item will be scored.
Clinical process modeling proposes to take the analytic reasoning processes within clinical reasoning and apply them to identify the reasoning proficiencies that make up a specific type of expert performance. If the individual pieces (e.g., conditions, symptoms, procedures) that make up a given area of medical expertise (e.g., cardiovascular disease, radiology, physician assistance) can be enumerated, each can, in turn, have its underlying clinical reasoning process broken down into proficiencies to provide a complete picture of that area of expert performance. While clinical reasoning suggests an approach for gaining this understanding, principled assessment design provides a framework to model it for assessment purposes.
In this article, we discuss the method for building a clinical process model and for applying that model to produce test items. We also consider additional applications for clinical process modeling.
Building and Using a Clinical Process Model
Developing a clinical process model
A clinical process model takes the form of a decision tree that maps the steps and pathways that correct diagnosis and therapy can take for a given presentation. Representing a construct as expansive as clinical reasoning within any domain of medical expertise requires clinical process models for the many and varied presentations that physicians commonly see in their practice. We developed one such model focusing on the topic of low back pain, a common presentation that internal medicine physicians see. With input from a group of four general internists selected for their expertise in internal medicine and clinical reasoning (including S.J.D.) and using a trusted reference on the management process for low back pain,7 we worked iteratively, through face-to-face discussions and conference calls, to develop a model that applied clinical reasoning to move from the common presentation of low back pain to the point at which either the internist hands off the case to another provider (e.g., referred to a specialist) or the patient has stabilized (e.g., recovered).
We began by walking through an uncomplicated presentation of low back pain, from the initial presentation to a prompt resolution. This conversation took the form of a simple interview with one of the internists:
Describe a straightforward initial presentation of low back pain.
A 45-year-old male comes into the office with a two-week history of low back pain after helping an office coworker move furniture. The pain worsens with physical activity.
What is the first thing you do when working with this patient?
Determine if there are any emergent issues in this patient that would demand immediate attention.
Assuming there are none in this simple presentation, what would you do next?
Identify any red flags that would suggest the possibility of something more severe than mechanical low back pain.
Again, assuming there are none, what is your next step?
The interview continued in this way until mechanical low back pain was diagnosed and the appropriate therapy was administered.
We confirmed the reasoning process for this basic presentation using the low back pain reference7 and mapped it into a single path (see Figure 1). We then examined the path to identify the components that, if given different values, would send the case down different paths toward proper diagnosis and/or management. Each new path formed a new branch in the process model. Through a review of the low back pain reference and two 1-hour face-to-face discussions of the models between the participating internists, the branches of the process model were developed, expanded, and revised until consensus was reached. The two authors of this article discussed the internists’ reviews of the model, which took another 2 hours. Diagramming the model was by far the most time-consuming part of the process, taking several hours but requiring no further involvement from the participating internists.
The full model is available as Supplemental Digital Appendix 1 at https://links.lww.com/ACADMED/A662. It is possible to continuously add detail to this decision tree. Completing it where we did was a practical and possibly temporary decision rather than an acknowledgment that all possibilities had been incorporated. Also, the model in Supplemental Digital Appendix 1 is but an example of the decision pathways for a low back pain presentation, and one of the features of this approach is that it can be modified to reflect different viewpoints as well as changes in practice over time. Additionally, the branches of the model can be generated by either a physician or through the use of a trusted, evidence-based reference so long as all results are confirmed by physician review and/or expert reference materials.
Developing test items
One of the benefits of the clinical process model is that, in addition to laying out the key proficiencies to be assessed, it also provides a tool for building test items to assess those proficiencies. Each box in the clinical process model represents a step that the medical expert could or should take, making the boxes potentially reflective of desirable proficiencies. All of the boxes that precede the targeted box provide essential background information for building an assessment scenario up to that point.
We used our clinical process model to produce test items that take the form of multiple-choice questions targeting important decisions within a patient-based scenario, which is the most prevalent item format on credentialing and licensure tests. Although we used our clinical process model to produce multiple-choice questions, it also can be used to produce test items in other formats. For example, as part of a complete test development process, clinical process models can be used to identify competencies that are difficult to assess with multiple-choice questions, and they can target such competencies with formats that are more appropriate to the identified competencies.
Below we detail the process we used to produce a sample test item. We identified the proficiency claim for which we wanted to provide evidence, generated the lead line, and determined the correct answer and plausible distractors. Only then did we develop a case description or item stem containing the information that supported the previously identified question and answer.
Identify the proficiency for which you want to provide evidence.
We chose a box in the decision tree with an important step that a physician must take and framed it as a proficiency claim. The chosen box reflects a task we would expect a credentialed internist to be capable of performing for the chosen content and within the developed context.
The box with the bold outline in Figure 2 shows a point in the decision tree at which an internist would have information that suggests that the patient could have cancer. We want to test whether internists can refine their hypotheses appropriately given the symptoms and account for them when considering the treatment of this patient.
Testing point: The board-certified physician should be able to refine hypotheses appropriately in a patient with low back pain and symptoms suggestive of potential cancer in the spine (primary or metastatic disease).
Generate the lead line.
We then used the testing point to generate an appropriate lead line for the corresponding multiple-choice question. We wanted to make sure that our lead line reflected the “refining hypotheses” task we identified earlier. The lead line does not need to use the specific language of the testing point, but it should reflect that the examinee would give weight to one condition over the others.
Lead line: What condition should you be most concerned about given this patient’s presentation?
Determine the correct answer.
The next box in the decision tree, based on the correct path being followed, contains the correct answer to the lead line. In our example, the next box we chose focused on cancer. We could easily have changed the path to lead to any of the boxes above or below the cancer box in the model.
Identify plausible distractors.
Nearby incorrect pathways lead to boxes containing plausible distractors that examinees would find compelling if they followed an inaccurate reasoning script down the wrong path. To identify plausible distractors, we looked to the nearby alternative boxes (see Figure 2), focusing on those with common clinical features. For example, spinal fracture has a similar age range association. Ankylosing spondylitis has night pain in common with cancer. Aortic aneurysm presents similarly but also has important distinguishing symptoms. These diagnoses became our plausible distractors.
Distractors: Spinal fracture, ankylosing spondylitis, aortic aneurysm
Generate the item stem content.
We followed the decision tree from the initial presentation to the box at which the decision being tested is made, incorporating the variables established along the way that make the correct answer clearly correct and those that make the distractors plausible but clearly incorrect. The item stem includes all the information up to and including the point at which we want the examinee to make a decision (see Figure 2). We did not put the item stem into a narrative format yet; instead, we made a list of the variable values that will be presented to the examinee.
• Age: 54
• Sex: Male
• Care facility: Clinic
• Primary complaint: Low back pain
• Onset time: 2 weeks ago
• Triggering incident: Moving furniture
• Other reported information: Pain worsens with physical activity
Columns 2 and 3: No additional information
• Night pain
• Pain at rest
• Recent weight loss
• No history of trauma
• No personal or family history of cancer
This list provides the information we need to craft a good test item.
Format the material into a test item.
We used the information we derived in the previous steps to write the test item, first putting the stem content into a narrative form and then presenting the lead line and answer options. The result is shown in Box 1.
Example of a Test Item Developed Using a Clinical Process Model
A 54-year-old male comes into the clinic with a two-week history of low back pain that began shortly after helping a coworker move furniture in their shared office. The pain worsens with physical activity. Intake shows no history of trauma or cancer, personally or in the family. Pain does not subside at rest, and the patient reports difficulty sleeping due to pain at night. Intake shows the patient has lost eight pounds since his last visit three months ago. The patient reports no special dietary or exercise regimen.
What condition should you be most concerned about given this patient’s presentation?
- A. Spinal fracture
- B. Aortic aneurysm
- C. Cancer in the spine
- D. Ankylosing spondylitis
- E. Mechanical low back pain
Correct answer: Cancer in the spine
Above, we described the development of a test item that is one of several possible variations that could result from a physician making slight changes in her or his path through our low back pain decision tree. Moving forward or backward or choosing a very different path would lead to items that address other important judgments and decisions that a physician would be expected to make. Any resulting test items would need to be evaluated by an expert for adequate adherence to best practices, appropriate sensitivity and specificity for diagnostic testing, and overall importance for assessment purposes.
When incorporating items built using a clinical process model into a test, educators should take care to adhere to the test specifications. New assessments built using clinical process models need to incorporate enough models to represent the full breadth of a defined construct. Clinical process models should capture the full complexity of the process being assessed; a process that cannot be captured fully is probably not one that can be comfortably or confidently assessed. Items produced from clinical process models should be subjected to pilot testing to establish the quality of the items and the models themselves.
While we illustrate the use of a clinical process model for test item development in this article, the value of this model extends to informing the entire assessment process. A clinical process model can identify even those proficiency claims that are not well suited to the multiple-choice format. Test development then can include alternate item types, like script concordance tests, essay prompts, or short open-response items, or new, innovative item types designed to provide evidence of less easily testable concepts. For example, an essay prompt could ask the examinee to describe the diagnostic process for a specific clinical presentation, and the clinical process model would serve as the rubric for scoring the essay rather than as the material for the clinical vignette. The same process used to develop a multiple-choice question could be used to develop a short open-response item; educators could just skip the step at which they identify plausible distractors.
Additionally, because knowledge organization is an important component of several contemporary clinical reasoning theories,3 we suggest that educators assess not only the points raised in each individual box in the decision tree but also the examinee’s ability to move through the tree. For instance, clinical process models could be used to produce items that simulate medical scenarios and require examinees to apply the clinical reasoning process from start to finish.
As each clinical process model identifies the proficiency claims for a specific medical domain, producing such a model for all the content that is relevant to a field of medical expertise should more fully illuminate the underlying constructs that define expertise in that domain. In such an approach, each proficiency claim should be articulated as part of a principled assessment framework. With constructs defined in this way, it becomes a matter of calculation to determine the number and distribution of items that represent the breadth and depth of expertise necessary to draw appropriate conclusions about examinees.
The benefits of using a clinical process model in assessment build neatly on one another. With a fully described construct and a clear picture of what a representative test should look like, educators can use a data-driven approach (e.g., embedded standard setting8) to base the threshold for passing on how well an examinee’s performance lines up with a content-based definition of expert performance. Doing so would move away from traditional standard-setting methods (e.g., Angoff method,9 Bookmark approach10) that require subject matter experts to make judgments about how well theoretical examinees will perform on a selection of items and toward approaches that are more consistent with the tenets of principled assessment design.
The test item development method we described above allows for rapid, on-demand, and even automated item generation, which can reduce the need to protect test items from exposure. Item and test information then can be shared more freely than it can now. With the model we described here, educators can alert examinees in more detail to what to expect on the test, including providing multiple, completely representative practice tests without depleting a limited item pool. After test administration, as the test items are no longer precious commodities to be used and reused for as long as possible, educators can share the exact items with examinees to offer more thorough performance feedback. Clinical process models for producing additional items can be housed in a publicly accessible archive and used by all educators.
The value of clinical process models is not limited to assessment. They also can be used as teaching tools, with learners using existing models to understand and apply clinical reasoning as it relates to specific conditions. Learners could construct their own process models to embed the clinical reasoning process into their understanding of how medicine is practiced.
Clinical process models have applications outside of academia as well. The same model that is used to assess and teach learners could serve to inform the practice of experts who wish to confirm their reasoning. The model could be used as a practical guide to ensure that the expert is taking the appropriate patient care steps. In addition, patients could look at a clinical process model that has been adapted to an appropriate level of complexity to understand what to expect from an initial presentation. It could enable them to better communicate their condition and concerns to their physician.
As educators develop additional clinical process models and validate them through the production of test items with high-quality content, format, and psychometric properties, they will be able to produce full tests using this process, automatically generate items, and produce test forms on demand. By establishing how clinical process models that address different conditions are related, educators can link them together to define the entire construct being tested. Once these construct enhancements have been implemented, improvements in scoring methods can follow. Understanding the relationships between constructs will allow educators to share content for assessment across areas of specialization.
Clinical process modeling has potential as a tool for assessing the complex process of clinical reasoning in a way that is compatible with principled assessment design practices. Although the development of a clinical process model can be resource-intensive, by providing a means for understanding and modeling important medical concepts, such models have the potential to greatly increase the efficiency of test item development and to inform and enhance the entire test development process. They also have the potential to inform learning and medical practice more broadly.
The authors wish to thank the American Board of Internal Medicine for their support for this work.