The concept of a threshold has long been important in medicine and other activities that involve decision making. The contemporary origin of the concept of thresholds in medicine can be traced to a series of articles by Pauker and Kassirer1,2 in the late 1970s and early 1980s. The thresholds commonly used in medicine today are the “threshold to test” and the “threshold to treat” as originally described by Pauker and Kassirer2 in the two-hypothesis case. In their model, if the likelihood of disease in a particular patient is less than the threshold to test, the condition is effectively ruled out and no further testing should be obtained. If the likelihood of disease is greater than the threshold to treat, the condition is ruled in and treatment should commence. Finally, if neither threshold is exceeded, further testing is warranted; this process repeats until a threshold has been exceeded. In contrast to these active thresholds, which are derived from decision analysis, diagnostic thresholds represent a passive categorization of patients by way of inclusion or exclusion. The diagnostic inclusion threshold and the therapeutic “threshold to treat” will usually coincide, as will the exclusion threshold and “threshold to test.” No matter how they are defined, the goal of thresholds is to convert the continuous spectrum of medical uncertainty into a manageable discrete model of classifications and actions.
Background: Thresholds in Diagnostic Decision Making
In de novo diagnosis, the level of uncertainty is the greatest because no information has yet been collected. A probability, if it can be defined, will lie somewhere between the inclusion and exclusion thresholds discussed above. This is the phase in which hypotheses are formulated and tested; treatment, if given, is empiric (derived from the historical practice of empiricism). If and when a threshold is exceeded after the calculation of posttest probabilities, the relevant hypothesis (usually “disease” or “no disease”) becomes the presumptive diagnosis. At this point, a second phase, of confirmatory testing, might begin. Any testing thereafter serves to increase the confidence in the diagnosis or to cast doubt, even to the point of excluding a tentatively accepted diagnosis3; treatment at this point would be considered presumptive. A theoretical further threshold of definitive diagnosis is more accurately considered a boundary condition. Because “definitive” implies a 100% likelihood of an outcome, most diagnosis and treatment in medicine are in fact presumptive.
Given that such thresholds exist, the goal of diagnosis is to move in an orderly fashion from uncertainty toward definitive diagnosis and management. We briefly review explicit thresholds, followed by a more extensive discussion of implicit thresholds, and how miscalibration and manipulation of these thresholds can lead to errors in medical decision making.
There are relatively few situations in medicine in which explicit, numerically defined thresholds are used in diagnosis. Although these situations are few, they are relatively common, and they can be divided into the following categories: (1) diagnoses determined by an abnormal test result, (2) diagnoses defined by a constellation of abnormal test results (i.e., syndromes), and (3) diagnoses defined from the observation and concatenation of data.
The most frequent explicit threshold clinicians encounter is the abnormal test result. The threshold of “abnormal” is not equivalent to a diagnostic threshold except in three specific situations. The first is when a disease is defined by reproducible abnormal test results, such as hypertension or diabetes mellitus. The second is when the abnormal test result is the “disease,” as is the case for hyperkalemia, hyponatremia, and “transaminitis.” The third is when the test is definitive, a criterion standard with perfect discriminating power for the presence of an illness.4 This test would identify all patients having a disease in a given population (100% sensitivity), without producing positive results in patients who do not have the disease (100% specificity). In reality, few laboratory tests or diagnostic procedures labeled as criterion or “gold” standards actually meet such strict criteria. The majority of laboratory tests are thus selected on the basis of their superior diagnostic sensitivity, in the case of screening tests, or specificity, in the case of confirmatory tests.
To classify a test result as abnormal, the limits of “normal” must be defined. For most diagnostic tests, the normal results or reference range are defined, by convention, as the central 95% of observed values in a given healthy population (this does not apply to diagnostic tests, such as physical exam maneuvers, that do not have a normal distribution).5 When a test is abnormal, the standard explicit approach is to calculate the posttest likelihood of disease by using likelihood ratios or some other method of Bayesian refinement.6 The pitfall of equating an abnormal test result with the definite presence of disease must be studiously avoided.
Syndromes deserve brief, but special mention. A syndrome, by definition, is a set of test results that occur together.7 Notably, the exact pathophysiology underlying a syndrome has not yet been determined. If the underlying mechanisms of a syndrome are discovered, it is often difficult to move the vocabulary of physicians from “syndrome” to “disease.” For example, Bartter syndrome is now known to be a family of channelopathies affecting the thick ascending limb of the nephron, yet the diseases are still commonly referred to as a syndrome.8 Tautologically, once a syndrome has been redefined as a disease, the constellation of test results should no longer be deterministic.
A diagnosis can be defined through the analysis of data, most commonly through the use of algorithms or the use of formal decision analysis. Much like the concept of syndrome, these are methods for associating test results with outcomes.9 The goal is to separate a single dataset into discrete categories not necessarily associated with a phenotypic condition. If these algorithms are used to guide diagnosis or management, they are considered guidelines. Examples include the Framingham Risk Score to determine the need for primary coronary artery disease prophylaxis10 and the Pneumonia Severity Index to determine the need for hospitalization in patients with community acquired pneumonia (age, not pneumonia symptoms, is the primary driving force of this algorithm).11
Regardless of the technique used to construct algorithms, there are some general principles which guide their use: (1) the quality and strength of the available data should be explicitly evaluated,12 (2) an algorithm should be validated with independent data or through resampling techniques such as cross-validation,13 and (3) an algorithm should be recognized as suggestive, but autonomy remains the province of clinicians and their patients.6 The third condition implies that robust algorithms can be, and are, contravened by clinician and patient alike. For example, it is currently recommended that patients with a “high” Thrombolysis in Myocardial Infarction risk score undergo early invasive percutaneous intervention (PCI).14 The patient's right to refuse is well established,15 and the clinician may also defer PCI for reasons that are patient-, institute-, or situation-specific. Of note, it has become increasingly frequent in malpractice litigation for guidelines to be cited as the standard of care.16 In addition, recording adherence to practice guidelines has found a growing niche in the “pay for performance” arena,17 the recent controversy involving timing to antibiotics notwithstanding.18
Most day-to-day medical decisions are made implicitly.19 Because decisions involve actions or categorizations, they are discrete and, therefore, must use thresholds. These implicit thresholds belong to the individual clinician. Unlike explicit thresholds, those that are implicit are not amenable to numeracy and are generally referred to as high or low.6 Low thresholds are sensitive but lack specificity, and they result in many false-positives. Likewise, high thresholds minimize false-positives at a cost to sensitivity. This division alone, however, lacks resolution. For example, the thresholds to treat suspected meningitis and asymptomatic hypertension are both low, but the reasoning guiding these decisions is fundamentally different. It is our suggestion that the “height” of a threshold be contrasted to the “gravity” of a diagnosis. Gravity as used in this sense is a composite measure of the severity of a condition, the rapidity of its onset and course, the net benefits (or harms) of the treatments available, and the net benefits (or harms) of not treating.
A low-threshold, low-gravity diagnosis would be one that does not have immediate clinical consequences and is easily treated with relatively safe medications. Chronic diseases such as diabetes and hypertension are archetypal examples. Conversely, the immediately life-threatening diagnosis for which good treatments are readily available defines the low-threshold, high-gravity situation. Meningococcal meningitis, which can be rapidly fatal if untreated but responds quite well to antibiotics, is a very good example.
Life-threatening or highly morbid conditions which are associated with dangerous treatments define the high-threshold, high-gravity situation. For example, cancer, which does not (usually) have a rapid onset, does have life-threatening consequences and highly toxic treatments. The diagnosis of cancer therefore generally requires biopsy (a highly invasive, but highly specific test) before treatment is initiated. Diseases for which there is no decisive treatment, such as amyotrophic lateral sclerosis, will tend to have their diagnostic thresholds dictated by the harms of ineffective treatment.
Is there any situation that has low gravity but a high threshold for inclusion? It would seem unlikely that much testing effort would go into a diagnosis of little importance. The one diagnosis that requires a high burden of proof but has no intrinsic gravity is the null hypothesis.
Examining the implications of miscalibration
In this article, we offer analysis of common clinical scenarios to illustrate components of implicit thresholds in diagnostic decision making. Previous research has shown that clinicians have difficulty accurately assessing pretest likelihoods and that they also tend to use test results in a non-Bayesian manner.20,21 To simplify this discussion, our assessment of pretest and posttest likelihoods is assumed to be accurate and Bayesian.
In our efforts to evaluate implicit thresholds, we first consider them to be static, or fixed, in the mind of an individual clinician. That is, for any particular patient, the clinician will have a set of implicit thresholds that do not change as additional information becomes available. If one of the clinician's implicit thresholds is well away from the average, or consensus threshold, it should be considered to be miscalibrated. By examining miscalibrations of the inclusion and exclusion thresholds, we are able to demonstrate some common clinical errors.
If implicit thresholds are, in fact, mutable (in effect, equivalent to using test results in a non-Bayesian manner), the situation becomes considerably more complicated. We evaluate this possibility by considering what might occur if an individual's implicit thresholds were manipulated during the process of diagnosis, and during the process of confirmatory testing. We also consider manipulations that might occur in the special case in which no threshold can be exceeded despite exhaustive testing. The discussion of manipulation will be simplified by only allowing singular variations of thresholds, and by assuming that they are calibrated correctly before the acquisition of data. All perturbations are qualitative, because of the difficulty of introducing numeracy to a discussion of implicit thresholds.
For simplicity, we use examples for the two-hypothesis case: disease versus no disease. The certainty of a diagnosis can range from 0% to 100%, and the sum of probabilities must equal unity. A test is any information that causes a change in the likelihood of a hypothesis, for instance, history, physical, lab data, etc. The change from pretest to posttest likelihood is the gain in certainty.22 The distance from the pretest probability to a threshold is inversely proportionate to the index of suspicion.23 The degree to which a threshold is exceeded is one possible measure of confidence in the diagnosis.3 The following vignettes depicting common clinical scenarios provide a framework for discussing implicit thresholds in diagnostic decision making.
Clinical Vignettes for Consideration
Miscalibration of implicit thresholds
Consider the following vignette: A 20-year-old college student presents to the emergency room with a headache, sore neck, high fever, and diffuse nondescript rash. This constellation of symptoms is worrisome for bacterial meningitis, so how might miscalibration affect the clinician?
First, if the inclusion threshold is too low, the clinician might diagnose meningitis without an adequate physical examination or confirmatory testing (i.e., lumbar puncture). This can lead to premature institution of high-dose antibiotics, with the associated risks, such as missing a pertinent allergy in a cursory review of systems.
Second, if the inclusion threshold is too high, the clinician will continue testing—and withholding treatment—well beyond the point at which his or her colleagues would have initiated treatment. In this case the clinician might defer treatment until all laboratory test results return, even though this practice is known to be deleterious when bacterial meningitis is suspected on the basis of history alone.24
Third, if the exclusion threshold is set too high, the clinician might exclude meningitis if the Kernig and Brudzinki signs are negative. Given that these signs are prone to false-negatives, most clinicians would still proceed with invasive testing.24 The implications of a false-negative in this situation are clear.
Finally, if the exclusion threshold is set too low, the clinician will continue testing well beyond the point at which his or her colleagues would have accepted that the diagnosis has been excluded. In our example, this might manifest as repeated lumbar punctures and extended, unnecessary hospitalization.
Miscalibrated static thresholds may have clinical, economic, and even legal implications for the individual clinician. Furthermore, the nature of miscalibration is somewhat predictable for a given level of training. Much of the effort of medical education is to bring the individual clinician in line with the consensus thresholds used in diagnosis and management. Whether appropriate or not, fixed implicit thresholds are internally consistent for the individual clinician.
Manipulation of implicit thresholds
We now turn to the possibility that implicit thresholds can be changed during the process of diagnosis. In general, if a person's implicit thresholds are altered as a function of the data being presented, he or she is acting in an internally inconsistent manner. Such alterations almost certainly occur, as they are descriptively homologous to some of the commonly observed cognitive heuristics and biases that affect humans. In this section, we use several examples to examine the possible cognitive behavior that might result from perturbations of the previously described static thresholds.
Before diagnosis: The test result is suggestive.
First, thresholds might be manipulated when test results are suggestive of a diagnosis, but the posttest likelihood does not exceed a presumptive threshold (Figure 1A). The appropriate interpretation of this situation is that the index of suspicion is increased, but further testing is necessary before including or excluding a diagnosis. As an illustration, consider the following scenario: A 58-year-old woman has just returned from a long transcontinental flight and has now developed mild shortness of breath and tachycardia. These test results (age, gender, recent travel, dyspnea, vital sign abnormalities) are suggestive of pulmonary embolism, but not sufficient to exceed the generally accepted inclusion threshold; more testing is needed. This is a relatively high-gravity, high-threshold diagnosis, given the risks of anticoagulation; likewise, the exclusion threshold is high, given the harms of not treating a pulmonary embolus. For the diagnosis of pulmonary embolism, the clinician can select from several tests that differ in performance characteristics as well as in degree of invasiveness.25 Now consider how a change in one of the thresholds might influence the clinical approach.
First, if the inclusion threshold is increased at this point, the clinician has “raised the bar,” making it more difficult to reach the diagnosis with further testing (Figure 1B). This is essentially maladaptive skepticism, which is a type of anchoring. In the example, this behavior might be manifest in glossing over the physical exam, which might have revealed a swollen leg, and ordering a CT pulmonary angiogram (CTPA). In this particular situation, a reasonable noninvasive test (venous Doppler ultrasound) is replaced by an invasive test with associated risks.
Second, if the inclusion threshold is instead decreased, the clinician has increased his or her susceptibility to the diagnosis under consideration (Figure 1C). Although this physician is still open to exclusion of the diagnosis, given negative test results, he or she is perilously close to premature closure (vide infra). It is even possible that this clinician will select subsequent tests with poor performance characteristics in an attempt to “shortcut” to the new, lower inclusion threshold.
Third, if the inclusion threshold is decreased so much that the diagnosis is presumptively accepted (Figure 1D), the clinician has succumbed to the well-known phenomenon of premature closure, also known as representativeness.20,26 Metaphorically, the clinician has heard hoofbeats and assumes that they belong to a horse, neglecting to consider other diagnostic possibilities. The dangers of premature closure include unnecessary treatment as well as missing other, potentially more serious diagnoses. In this scenario, the physician would begin anticoagulation treatment without further testing. Given that there are many other explanations for dyspnea and tachycardia in a 58-year-old woman, and given the risks of therapeutic anticoagulation, this course of action is ill advised.
Now, as a fourth possibility, consider what might happen if the exclusion threshold is instead varied. As with the change in the inclusion threshold, the exclusion threshold can be raised, making it harder for further testing to suggest alternate diagnoses (Figure 1E). This type of change is equivalent to “closing off,” which is another type of anchoring. In the example, if a CTPA were negative, the interpretation might change from decisively excluding pulmonary embolism to requesting further testing (e.g., a pulmonary angiogram, a highly invasive procedure).
Fifth, if the exclusion threshold is lowered, the clinician becomes overly open to alternatives (Figure 1F). A negative test with poor negative performance characteristics, such as venous Doppler ultrasound, might lead the clinician to conclude that pulmonary embolism has been definitively ruled out, instead of appropriately obtaining more testing.
The final possible change is one in which the exclusion threshold is lowered to such a degree that the diagnosis is rejected (Figure 1G). This behavior seems almost paradoxical, because the information points to one hypothesis but the other hypothesis is accepted. This type of premature closure is the obverse of the representativeness discussed in the third possibility—This clinician hears hoofbeats and concludes that they must belong to a zebra.
The frequency with which each of these manipulations occurs is difficult to determine and is likely to be problem-specific. “Lowering the bar,” as discussed in the second manipulation, is likely to be especially common, as it is quite tempting to begin to jump to conclusions when confronted with suggestive information that reinforces a hypothesis. Accepting the opposing hypothesis, as suggested in the final manipulation, is an example of the availability heuristic20 and is a common pitfall among early learners as well as those afflicted by the recency effect or the Von Restorff effect (information that stands out as different is more likely to be remembered).27,28
Before diagnosis: The test result is diagnostic.
Now consider the logical extension of the previous scenario: The available test results are sufficient to cross the presumptive inclusion threshold (Figure 1H). For example, a 28-year-old woman with a long history of heavy menstrual cycles presents to the office complaining of fatigue. Her hemoglobin is 8.5 g/dL, with a mean corpuscular volume of 70 fL. Her ferritin level is measured to be 9.0 ng/mL. This constellation of test results (age, gender, heavy menses, microcytic anemia, and low ferritin) is sufficient to prove the diagnosis of iron-deficiency anemia due to menometrorrhagia, and appropriate treatment should proceed.29
In this scenario, only one type of manipulation is applicable. The inclusion threshold is increased such that the posttest probability no longer exceeds the threshold (Figure 1I). All other changes would affect confirmatory testing and are discussed below. If this change is made, the physician is guilty of undervaluing the data and anchoring to uncertainty. This common reaction leads to unnecessary further testing, such as a bone marrow biopsy, endoscopy, and others. These potential interventions are clearly costly and unnecessary, and they introduce a high risk of iatrogenic complication.
After diagnosis: Confirmatory testing suggests a diagnostic error.
Once a diagnosis has been presumptively made, the clinician is confronted with a choice: accept the diagnosis as is, or pursue further testing to confirm the diagnosis. Confirmatory testing is often necessary, as many diagnoses are in fact inherently unstable, and the results could either throw a diagnosis in doubt or suggest a completely alternative diagnosis. Conversely, a reassuring confirmatory test can increase the confidence that a conclusion was not reached by chance.
If a confirmatory test reinforces the hypothesis, the possible manipulations of thresholds parallel those discussed above. If the test instead suggests that the hypothesis is wrong (Figure 2A), the following clinical vignette illustrates how the thresholds might change: A 35-year-old man presents to the clinic with several days of high fevers, a productive cough, and dehydration. He denies any chronic medical conditions, and he does not smoke. Physical exam reveals rales and bronchial breath sounds at the right base, and a chest radiograph shows a dense right-lower-lobe infiltrate. The presumptive diagnosis of community-acquired pneumonia would be made by most clinicians.30 Whether or not treatment is initiated, suppose that a confirmatory test—a sputum culture—is obtained.
If the sputum culture does not grow any pathogenic organisms, the confirmatory test suggests that the diagnosis might be incorrect. The sputum culture has very poor negative predictive characteristics,31 and pneumonia should remain as the most likely diagnosis. Now consider a change in thresholds at the time of receiving these extra data.
First, the inclusion threshold could be lowered (Figure 2B). This would result in an increased unwillingness to consider an alternative diagnosis, if further confirmatory testing were to be performed. This is the more classical definition of anchoring—A diagnosis has been established, and the physician would like to maintain this status quo despite evidence suggesting otherwise.20 By expanding the “safe harbor” surrounding the clinician's current level of certainty, he or she is more likely to stay with an unstable diagnosis. This clinician may be more likely to stop confirmatory testing at this point.
Second, if the inclusion threshold is raised at this point, the clinician is displaying excess skepticism (Figure 2C). While this would not affect the ultimate outcome (treating for pneumonia) if testing were to stop with the sputum culture, this sort of change in threshold is likely to be followed by further confirmatory testing. In this example, the clinician might subsequently order a CT scan of the thorax to make sure the patient really has pneumonia. This is unnecessary and expensive, and it exposes the patient to a higher radiation burden.
Finally, if the inclusion threshold is raised to the point at which the diagnosis is no longer included, the clinician incorrectly doubts the diagnosis (Figure 2D). This clinician would be obligated to return to diagnostic testing, which might be termed an “erroneous return to ambiguity.” Although the result may be similar to raising the inclusion threshold as far as obtaining further unnecessary tests, this clinician is also likely to stop treatment if it has already begun. This situation is quite common, especially in more intensive health care settings, and temporary lapses in treatment can have dire consequences.
After diagnosis: Confirmatory testing throws the diagnosis into doubt.
Now suppose that the sputum culture grows something unexpected—Pseudomonas spp. The presence of pseudomonas is strongly suggestive of another etiology behind the symptoms, such as bronchiectasis or undiagnosed cystic fibrosis. There is not enough evidence to confirm an alternative diagnosis, but the certainty in a diagnosis of community-acquired pneumonia should decrease to the point at which the presumptive diagnostic threshold is no longer exceeded (Figure 2E). Now consider the previously discussed dynamic threshold changes.
First, any increase in the inclusion threshold and a decrease not sufficient to maintain the diagnosis within the presumptive inclusion area are equivalent to the situations previously discussed in the subsection “Before diagnosis: The test result is suggestive.”
Second, the inclusion threshold could be lowered such that the updated probability would continue to lie within the presumptive inclusion area (Figure 2F). This is, again, anchoring. Additionally, this clinician has committed a type I error, by accepting a false-positive as defined by the clinician's implicit thresholds.32 Such a scenario would be manifest by continued treatment in the face of glaring evidence that another alternative must be considered. In the running example, this would include inappropriate use of antibiotics as well as possible deprivation of alternative, life-saving treatments. Given the temptation to maintain the status quo, this may be one of the most common dynamic shifts in threshold observed in practice.
Finally, what if the clinician alters the exclusion threshold, such that not only is the first diagnosis cast into doubt, but the alternative diagnosis is accepted (Figure 2G)? This is similar to the type II error of incorrectly accepting a null hypothesis (although in this case, the alternative is pseudomonal infection, not the null).32 This clinician would start antipseudomonals, exposing the patient to possibly inappropriate and dangerous treatment.
The case where no diagnosis can be made
What if all possible tests have been evaluated, yet the final posttest likelihood still lies somewhere between the presumptive inclusion and exclusion thresholds (Figure 3A)? The honest clinician will inform the patient that despite reasonable testing, the underlying condition is not known. This nebulous uncertainty is usually both frightening and frustrating for patient and practitioner alike.33
A tempting reaction to this high degree of uncertainty is to establish a diagnosis, often by moving the nearest threshold just enough so that it is crossed. This type of threshold manipulation is an application of the representativeness bias.20 If the final “gap” between the posttest probability and a threshold is small, this may be acceptable (a small leap of faith, so to speak). Individual practitioners are likely to have different tolerances for this type of action. A conservative clinician may defer treatment even if the gap to the inclusion threshold is relatively small, whereas a clinician with a more lenient approach may be willing to treat with a moderate degree of residual uncertainty. If the gap is relatively large, yet the clinician still elects to establish a diagnosis, several types of behavior may become manifest: (1) The avoidant clinician will manipulate his or her exclusion threshold to rule out the condition (Figure 3B), (2) the impulsive clinician will manipulate his or her inclusion threshold to rule in the condition (Figure 3C), or (3) the novice and the clinician afflicted by the recency effect will rely heavily on the availability bias (i.e., the clinician will manipulate the threshold that leads to the most common or the most recent categorization).
A more durable response would be to develop a new clinical test, so that more information becomes available (Figure 3D). This approach is exemplified by the numerous genomic analyses that have emerged in recent years. For example, breast cancer has been shown to be a heterogenous disease with four major subtypes, each of which is associated with particular susceptibilities and prognosis.34
The final response when confronted with unresolved uncertainty is to generate a completely new hypothesis (Figure 3E). An anecdote from one of the authors illustrates this point. In 1976, this author evaluated a young man whose chief complaint was hematuria. The patient was paraplegic, and initially the complaint was thought to relate to self-catheterization. However, cystoscopy revealed an exophytic bladder mass, and the pathology was interpreted as atypical transitional cell carcinoma (TCC). Shortly thereafter, the patient developed approximately 50 violaceous skin lesions, and skin biopsy demonstrated similar histology. No other evidence of metastasis was found, so the patient was diagnosed with atypical TCC of the bladder with atypical metastasis to the skin. The case was presented at dermatology grand rounds, and a visiting professor from New York immediately questioned the pathologic diagnosis, thinking that this actually resembled a rare disease, Kaposi sarcoma. Only in retrospect has it now become obvious that this was probably one of the first cases of AIDS in San Francisco, preceding the original case series by five years and the isolation of the virus by seven.35,36 To revisit the zebra metaphor, this case was a flying zebra—unusual, unsuspected, and without any characteristic hoofbeats. It was only through the generation of a completely new hypothesis that the disease now known as AIDS came to be recognized.
Recognizing and Reducing Threshold Manipulation
Our analysis has introduced a new explanatory model for some of the cognitive biases that can arise during the process of diagnosis. As with any model, extensive simplifications were necessary. Clearly, if a clinician alters his or her internal thresholds, the clinician is likely to alter more than one at a time. The process of differential diagnosis involves considering many alternate diagnoses (each associated with its own distinct thresholds), not just the two-hypothesis case used in the examples above.
Most previous discussions of cognitive error in diagnosis have focused on (1) incorrect estimation of the pretest likelihood and (2) non-Bayesian updating of likelihood, with either augmentation or diminishment of posttest estimates.6,19,20,26,37 Although these types of errors certainly occur, some concepts such as the Von Restorff “hoofbeats equals zebras” phenomenon cannot be explained completely. By focusing the discussion on the threshold concept, we have illustrated most of the commonly observed heuristics and biases that occur in the process of medical decision making.38 Cognitive errors may account for upward of 75% of all diagnostic errors, so a more thorough understanding of how they arise is highly desirable.26 Because the majority of day-to-day clinical decisions involve the use of implicit approaches, knowledge and avoidance of miscalibration or manipulation of implicit thresholds are extremely important to clinicians. Miscalibration can be addressed by enhanced education at all stages of training so that individual clinicians' thresholds are closely aligned with consensus thresholds.
Reducing the manipulation of thresholds is a much more complicated situation. Some manipulation may be partially avoidable through the process of meta-cognition.39 The process of shared decision making can allow other clinicians and patients to observe the practitioner's thought processes and may reveal inconsistencies. The current practice of learning at the student, intern, and resident levels in medicine emphasizes individual acquisition and processing of clinical data; interactions with attending physicians are often reduced to cursory “card flips.”40 Increased supervision and team communication, especially at the early stages of training, is likely to reduce threshold manipulation,41,42 although collaborative decision making is also susceptible to error.43
Automated decision aids may have a role in guiding clinicians away from threshold manipulation, but such aids are not currently widely available in practice. One potential system that would be amenable to further evaluation would be a hybridized decision aid requiring the user to provide a best guess of the numeric values of his or her implicit threshold. Because pretest probabilities and test characteristics are available for many conditions, such a system could provide numeric feedback during data collection. One author (J.L.W.) has previously proposed such a system using the example of pulmonary embolism.44 Further study of the psychological basis of implicit thresholds, as well as methods to translate an individual's thresholds into accessible, explicit thresholds, should be pursued. Such research is likely to provide important insights into prevention of diagnostic error, formation of expert diagnostic approaches, and optimal utilization of health care resources.
Dr. Warner would like to thank Robert Hecht-Nielsen, PhD, for sharing his knowledge on expert systems and adaptive diagnostics, Jeffrey Kohlwes, MD, for his ongoing support and mentorship, and Joseph Pliskin, PhD, for his mentorship through the Society for Medical Decision Making. The authors would like to thank the reviewers for their comments and suggestions.