From making a diagnosis, to creating a treatment plan, to deciding on a disposition for the patient who reports suicidal ideation in the hospital, psychiatrists routinely make predictions for an individual patient using clinical judgment about the presenting case and the data that support it. Psychiatry is approaching an era of an unprecedented knowledge base for conceptualizing and treating mental illness, due in part to rapid development in neuro- and cognitive sciences, and technological advances that can translate discoveries to real-world evidence to guide clinical decision support. However, clinicians have long known that beyond the evidence pooled from populations, there is unique information about the individual patient that can influence a clinician’s prediction at a given moment. A complex model has always been necessary to understand the mechanics of the human mind.
Paul E. Meehl was an American clinical psychologist, philosopher of science, and one of the earliest thinkers to contribute to the theoretical grounds for predicting clinical outcomes using quantitative methods.1 Specifically, his 1954 book Clinical versus Statistical Prediction famously examines the limitations and unique leverage of clinician judgment for predicting human behaviors. This column will trace the origins of his seminal argument that the application of a data-driven approach should function together with clinician insight in selecting models of behavioral prediction. His legacy is particularly timely for today’s psychiatric researchers and clinicians, who are increasingly faced with the challenging task of translating rapidly expanding data about the human brain and behavior into sound clinical practice guidelines.
Meehl was born on 3 January 1920 in Minneapolis, Minnesota in a liberal Methodist home. When he was 16, his mother died after her internist misdiagnosed a spinal tumor as Meniere’s disease. In his autobiography, Meehl wrote that “This episode of gross medical bungling permanently immunized me from the childlike faith in physicians’ omniscience . . . and helped me to avoid dogmatism about my own diagnostic inferences.”2 In 1945, he began his doctoral studies in psychology at the University of Minnesota under Starke R. Hathaway, who was influential in the development of the Minnesota Multiphasic Personality Inventory (MMPI), a standardized psychometric test of adult personality and psychopathology. Meehl’s subsequent postdoctoral research investigated the suppressor variable called K factor, a quantitative proxy of one’s test-taking attitude and its statistical correlations with psychological traits of the subject taking the MMPI,3 such as defensiveness. This marked his early interest in using an objective index to optimize the reliability of behavioral prediction enabled by self-reports.
CLINICAL VERSUS STATISTICAL PREDICTIONS
Meehl’s legacy in delineating the boundary between hypothetical constructs made by humans and structured, quantitative data generated by statistical methods was best exemplified in Clinical versus Statistical Predictions. In the controversial manuscript, he argued that “actuarial methods,” i.e., mathematical combination of data, outperform “clinical methods” when predicting behavior. He opened his thesis by disagreeing with one of his prominent contemporaries, T. R. Sarbin, who wrote that the clinician always, albeit implicitly and inefficiently, operates as an empiricist by gathering impressions of past behavior that directly allows prediction of future behavior.4 To this, Meehl stated that one should “distinguish carefully between how you get there and how you check the trustworthiness of your judgment.”5 Writing that the problem of bias arises when clinicians “assign a confidence, weight, or probability to the sentence which speaks of [the person’s behavior],” Meehl argued that the empirical lens through which clinicians interpret human behavior is inherently limited and does not achieve the precision essential to the quality of prediction.
For example, he refuted a conclusion in the 1941 study that the psychometric scale is inferior to qualitative case histories as a predictor of life choices made by Harvard College graduates.6 The scale’s prediction rates of only 36.9%, which was the second lowest number among the compared modalities of prediction, and hence the evidence for its inferiority, were not the issue. The problem was that the methodology did not involve any empirical comparison between actuarial and nonactuarial methods, but instead between different kinds of data from “the [same] judges [who] combined the information in whatever manner seemed subjectively most appropriate, in the absence of any exact knowledge concerning the statistical relationships between [the judgment] and the to-be-predicted behavior.” “The ideal design,” he wrote, “is one in which the same basic set of facts is subjected on the one hand to the skilled analysis of a trained clinician, and on the other hand to mechanical operations for comparison” where the bias of subjectivity in combining data is minimized.5 For example, entering the frequencies of a qualitative observation across cells in a table, followed by mathematically manipulating those numbers to arrive at the prediction of the ground truth will be equivalent to the actuarial method. By having the life choices predicted always by judges in the original paper, the result was that “actually, all the predictions were made clinically.” Reviewing twenty studies that used the appropriate comparative analysis, he concluded that the actuarial method had indeed higher predictability. A 2000 meta-analysis that compared methods used in 136 studies concerning the prediction of human health and behavior replicated this conclusion.7
MEEHL’S VIEW ON THE CLINICIAN’S “SPECIAL POWER”
While the most essential contribution of his thesis was “to draw a sharp distinction between data gathering (e.g., interview, psychometrics, behavioral observation) from data combination,” Meehl’s friend and frequent collaborator William Grove wrote that Clinical versus Statistical Predictions actually originated in the late psychologist’s efforts to examine “the nature and proper role of clinical inference.”8 In fact, Meehl admits that he spent more pages “defending the unique inferential activity” of a clinician than “criticizing his [or her] predictive deficiencies.”2 According to him, “the behavior which is important to clinicians always involves, at least indirectly, interaction with other human organisms” and “the problem of specifying [behavioral] response classes [is] a fantastically complicated one.” “What the clinician does,” he wrote, “is to utilize the given facts … to invent a hypothesis concerning the state of certain intervening variables or hypothetical constructs in his patient.”5 This also suggested that, despite its predictability, data combination using actuarial methods may not always be relevant to the problem the clinician wants to predict.
Meehl was aware of the implication of his thesis for the clinical community. To the critics who said that his work pitted the value of clinical evaluation against mechanical methods, he argued that the concocted rivalry between the two is “a ridiculous position when the context is pragmatic decision making.”9 It is important to note that Meehl was a psychodynamically trained clinician, who continuously practiced psychotherapy until several years after his retirement from the University of Minnesota. He was less interested in choosing clinical versus statistical prediction than in re-examining the widely influential conjectures made by his contemporaries in the psychoanalytic field using rigorous methods. He recalls “it was easy for me to be relatively fair-minded about this charged topic, as I had strong identifications on both sides.”2
His comfort in navigating different disciplines distinguishes his professional highlights in general. In “What, Then, Is Man?”(1958), Meehl collaborated with a group of Lutheran theologians and psychologists to explore how Christians could responsibly practice faith and science without betraying orthodoxy in either side.10 In 1962, he was elected president of American Psychological Association at age 42 and delivered “a radical doctrine” that phenotypic heterogeneity of schizophrenia demands “an adequate theoretical account [of] explanatory constructs at the neurophysiological level,”2,11 long before the complex, polygenic model of schizophrenia was discovered through advanced research methods in molecular genetics and genomics.12 After a long career training the next generation of mental health clinicians in diagnosis and psychotherapy, he died of chronic myelomonocytic leukemia on 14 February 2003 at his home in Minneapolis.
MEEHL’S LEGACY FOR SCIENTISTS AND CLINICIANS STUDYING HUMAN BEHAVIOR
Almost seven decades after the publication of Clinical versus Statistical Predictions, Meehl’s contribution to the practice of psychological sciences transcends academic boundaries. In the field of psychometrics, MMPI, where he developed the K factor and now in its third iteration after undergoing standardizations on more racially and educationally representative adult samples, is a hallmark of psychological testing with validity scales that provide information about a subject's response bias such as overreporting and social desirability. The settings of its administration have since been extended to forensic assessment13 and pre-surgical screening evaluations14 to assist legal and medical professionals to understand the psychological profiles of a client or patient as objectively as possible in making high-stakes decisions. On the data science side, especially in the field of machine learning and artificial intelligence, the predictive principles and structure of neural and behavioral data are surfacing at faster rates with few human formal assumptions.15 This computational approach, which Meehl would likely classify under actuarial methods, now provides experimental capacity for researchers to analyze natural language, vocal acoustics, facial expressions, neural signals, behavioral task, and smartphone data for diagnostic purposes.16-21 The launching of the Research Domain Criteria (RDoC) program in 2009 by the National Institute of Mental Health to generate a transdiagnostic framework for classifying mental illness represents the field’s turn toward quantitatively interpreting the large amount of newly available data.22
In parallel, scientists argue that data require plausible models if they are to be understood and explained. Increasingly nuanced discussions address how data collected in laboratory and real-world environments must model time and context to explain complex behaviors in mental health populations,23,24 and that generalizability and interpretability in machine learning algorithms are priorities in developing sound clinical prediction models.25,26 Humans have always been the active agents who find competing models for explaining data and rely on quantitative information to select the best among them.27 More than ever, it is the responsibility of researchers to actively examine bias in datasets and algorithms if their research is to be clinically useful and safe. For clinicians, knowing when to apply data-based insights in an individual context, checking their own cognitive biases and racial assumptions during clinical work,28,29 and using intuition to frame important research questions from the bedside continue to be uniquely a human work. This strict evaluation of new research process and delivery is “a clinician’s special power,” the crucial step in solving a prediction problem to benefit patients.
Currently, RDoC is a framework for organizing psychiatric research, not a diagnostic system. There is also a promising, but long way to go toward the development of precision psychiatry as a paradigm of every day clinical practice. However, already in the growing field of computational psychiatry, a move toward collecting human behavior data in naturalistic settings, such as gamified smartphone apps, reflects the scientific community’s interest in generating quantitatively robust, yet clinically usable data that fit with human behaviors in the wild.30 From modelling to deployment stages of artificial intelligence algorithms, implementation scientists are increasingly paying attention to approaches for debiasing training data to account for racial inequities in mental health, and to the clinician-machine interface to minimize potential conflicts between human experts and mathematical models.31 In the changing landscape of contemporary psychiatric research and practice, Meehl’s balanced advocacy for both matter-of-fact and intuitive approaches to quantify human behavior reminds us of the limitations and opportunities of being human.
Declaration of interest
The author reports no conflicts of interest. The author alone is responsible for the content and writing of the article.
The author would like to thank Jacob Appel, M.D., J.D., Paul Rosenfield, M.D., and David Rosfeld for careful reading and comments of the manuscript.
1. Das AK. Computers in psychiatry: a review of past programs and an analysis of historical trends. Psychiatr Q 2002;73:351–5.
2. Meehl PE. A history of psychology in autobiography Vol. VIII. Stanford, CA: Stanford University Press, 1989.
3. Meehl PE, Hathaway SR. The K factor as a suppressor variable in the Minnesota Multiphasic Personality Inventory. J Appl Psychol 1946;30:525–64.
4. Sarbin TR. The logic of prediction in psychology. Psychol Rev 1944;51:210–28.
5. Meehl PE. Clinical Versus Statistical Prediction. Minneapolis: University of Minnesota, 1954.
6. Polansky NA. How shall a life-history be written? J Pers 1941;9:188–207.
7. Grove WM, Zald DH, Lebow BS, Snitz BE, Nelson C. Clinical versus mechanical prediction: a meta-analysis. Psychol Assess 2000;12:19–30.
8. Grove WM. Clinical versus statistical prediction: the contribution of Paul E. Meehl. J Clin Psychol 2005;61:1233–1243.
9. Meehl PE. Causes and effects of my disturbing little book. J Pers Assess 1986;50:370–5.
10. Meehl PE. What, then, is man? a symposium of theology, psychology, and psychiatry. St. Louis, MO: Concordia Publishing House, 1958.
11. Meehl PE. Schizotaxia, schizotypy, schizophrenia. Am Psychol 1962;17:827–38.
12. Henriksen MG, Nordgaard J, Jansson LB. Genetics of schizophrenia: overview of methods, findings and limitations. Front Hum Neurosci 2017;11:322.
13. Ben-Porath YS, Heilbrun K, Rizzo M. Using the MMPI-3 in legal settings. J Pers Assess 2022;104:162–78.
14. Block AR, Ben-Porath YS, Marek RJ. Psychological risk factors for poor outcome of spine surgery and spinal cord stimulator implant: a review of the literature and their assessment with the MMPI-2-RF. Clin Neuropsychol 2013;27:81–107.
15. Bzdok D, Meyer-Lindenberg A. Machine learning for precision psychiatry: opportunities and challenges. Biol Psychiatry Cogn Neurosci Neuroimaging 2018;3:223–30.
16. Corcoran CM, Carrillo F, Fernández-Slezak D, et al. Prediction of psychosis across protocols and risk cohorts using automated language analysis. World Psychiatry 2018;17:67–75.
17. Marmar CR, Brown AD, Qian M, et al. Speech-based markers for posttraumatic stress disorder in US veterans. Depress Anxiety 2019;36:607–16.
18. Liu W, Li M, Yi L. Identifying children with autism spectrum disorder based on their face processing abnormality: a machine learning framework. Autism Res 2016;9:888–98.
19. Kulkarni KR, Schafer M, Berner LA, et al. An interpretable and predictive connectivity-based neural signature for chronic cannabis use. Biol Psychiatry Cogn Neurosci Neuroimaging 2022.
20. Maatoug R, Oudin A, Adrien V, et al. Digital phenotype of mood disorders: a conceptual and critical review. Front Psychiatry 2022;13.
21. Banker SM, Na S, Beltrán J, et al. Disrupted computations of social control in individuals with obsessive-compulsive and misophonia symptoms. iScience 2022;25:104617.
22. Insel TR. The NIMH Research Domain Criteria (RDoC) Project: precision medicine for psychiatry. Am J Psychiatry 2014;171:395–7.
23. Hitchcock PF, Fried EI, Frank MJ. Computational psychiatry needs time and context. Annu Rev Psychol 2022;73:243–70.
24. Torous J, Bucci S, Bell IH, et al. The growing field of digital psychiatry: current evidence and the future of apps, social media, chatbots, and virtual reality. World Psychiatry 2021;20:318–35.
25. Payrovnaziri SN, Chen Z, Rengifo-Moreno P, et al. Explainable artificial intelligence models using real-world electronic health record data: a systematic scoping review. J Am Med Inform Assoc 2020;27:1173–85.
26. Barragán-Montero A, Bibal A, Dastarac MH, et al. Towards a safe and efficient clinical implementation of machine learning in radiation oncology by exploring model interpretability, explainability and data-model dependency. Phys Med Biol 2022;67.
27. Kuhn TS. The structure of scientific revolutions. Chicago: University of Chicago Press, 1970.
28. Mendel R, Traut-Mattausch E, Jonas E, et al. Confirmation bias: why psychiatrists stick to wrong preliminary diagnoses. Psychol Med 2011;41:2651–9.
29. Hairston DR, Gibbs TA, Wong SS, Jordan A. Clinician bias in diagnosis and treatment. In: Medlock MM, Shtasel D, Trinh NHT, Williams DR, eds. Racism and psychiatry. Totowa, NJ: Humana Press; 2019:105–37.
30. Hauser TU, Skvortsova V, De Choudhury M, Koutsouleris N. The promise of a model-based psychiatry: building computational models of mental ill health. Lancet Digit Health 2022;4:e816–28.
31. Koutsouleris N, Hauser TU, Skvortsova V, De Choudhury M. From promise to practice: towards the realisation of AI-informed mental health care. Lancet Digit Health 2022;4:e829–40.