Secondary Logo

Journal Logo

Original Articles

Machine Learning to Differentiate Risk of Suicide Attempt and Self-harm After General Medical Hospitalization of Women With Mental Illness

Edgcomb, Juliet B. MD, PhD*; Thiruvalluru, Rohith MS; Pathak, Jyotishman PhD; Brooks, John O. III MD, PhD

Author Information
doi: 10.1097/MLR.0000000000001467


Despite prevention efforts, rates of suicide in the United States have risen for nearly 3 decades.1 Suicide prevention is a public health priority2 and improving suicide prevention efforts during and after medical hospitalization is an area of focus of the updated 2019 Joint Commission National Patient Safety Goals.3 An important, and under-investigated area, is prevention of suicide and suicide-related behavior following general medical hospitalization.4

There are well-established sex differences in suicide risk5,6 and rates of suicide are rising disproportionately among women in the United States.7,8 Over 700,000 women attempt suicide each year.9 Most women who die by suicide have had recent contact with their health care providers.2 Women with serious mental illness are at particularly high risk of suicide-related behavior,10 and risk of suicide increases further following acute medical illness.11 Recent endeavors to measure, predict, and prevent suicide attempts among women after acute medical interventions have focused almost entirely on obstetric care.12,13 However, nonobstetric medical hospitalization is also associated with death by suicide following hospital discharge11: individuals who complete suicide are 3 times more likely to have been discharged from a medical rather than a psychiatric hospitalization.14 Yet, suicide and self-harm by women after medical hospitalization has remained scarcely explored.

In this study, sociodemographic and clinical characteristics derived from electronic health records (EHRs) are used to predict risk of readmission for serious suicide attempt and self-harm among adult women with serious mental illness (depression, bipolar disorder, and chronic psychosis) after general medical hospitalization. We focus on nonobstetric general medical hospitalizations to address the gap in evidence surrounding predictors of suicidal behavior in medically ill women. We deploy a supervised machine learning method optimized for clinical interpretability,15 Classification and Regression Tree (CART) modeling, to produce risk profiles using a broad array of predictors, and validate the approach using separate datasets to identify common risk subgroups. Using this approach, we distinguish risk profiles of hospitalizations followed by a suicide-related readmission from hospitalizations followed by a non-suicide-related readmission. Our primary hypothesis was that routinely collected EHR data could be used to produce accurate risk profiles of readmission for suicide-related behavior after medical hospitalization. We also hypothesized that common risk profiles could be consistently identified in diverse populations and across different health system settings. To our knowledge, this is the first study to assess women’s risk for suicide attempt and self-harm after medical hospitalization using large-scale EHR data and machine learning methods.


Study Design

This was a population-based, retrospective cohort study using EHR data collected between 2006 and 2017 from a large, urban academic medical center (University of California, Los Angeles; UCLA) comprising 8408 index general hospitalizations of women with depression, bipolar disorder, schizophrenia, or schizoaffective disorder. After extraction, data were analyzed using CART modeling. To determine if we could identify similar risk profiles in a broader population and across different health system settings, we applied the modeling approach to a larger multi-institutional network EHR dataset—the PCORI-funded New York City Clinical Data Research Network (NYC-CDRN)—comprising 841,834 index hospitalizations across NYC.16 Results were then compared with identify common risk profiles. Because this study was a secondary analysis of existing deidentified health records, the need for ethical approval and participant consent was waived by the UCLA and NYC-CDRN Institutional Review Boards.

Dataset Description

Patient data were extracted from the UCLA Integrated Clinical and Research Data Repository (UCLA-xDR) and NYC-CDRN datasets. The UCLA-xDR repository contains 10 years of outpatient and inpatient EHR data collected from 2006 through 2016 from 2 academic medical hospitals within the UCLA Health System, totaling 765 inpatient beds. The NYC-CDRN dataset contains outpatient and inpatient EHR data collected from 2009 through 2017 from 7 health systems across the NYC metropolitan area including data on ∼12 million unique patients.16

The index hospitalization was defined as the first general hospitalization during the study period for a primary medical (nonpsychiatric) diagnosis. From each dataset we limited data extraction to: (1) adult women (18 years old and above with natal sex female), (2) with ICD code(s) for depressive disorder (ICD-9 296.20-296.36; ICD-10 F32-33.x), bipolar disorder (ICD-9 296.00-296.89, 296.40-296.89; ICD-10 F31.0-31.9), or schizophrenia or schizoaffective disorder (ICD-9 295.xx; ICD-10 F20.x, F25.x) present at discharge from index hospitalization, and (3) 2 or more general hospitalizations during the study period (2009–2016 for UCLA sample and 2009–2017 for the NYC-CDRN sample) (see Supplemental Table 1, Supplemental Digital Content 1, Restriction to recurrent hospital utilizers was done to ensure that EHR data on at least 1 all-cause readmission within our health systems was available for all participants, and to limit confounding of predictors of readmission from predictors of suicidality. The index hospitalization was defined as the first general hospitalization during the study period for a primary medical (nonpsychiatric) diagnosis. Natal sex was used to identify females as information on patient sex identity was inconsistently reported. Obstetric hospitalizations were excluded from the analysis to focus on risk profiles of suicide-related behavior outside of the antenatal and postpartum periods.


The primary outcome was medical rehospitalization for nonfatal suicide attempt or intentional self-harm in the year following discharge from a general medical hospitalization. Nonfatal suicide attempt and intentional self-harm were defined by ICD-9 and ICD-10 codes specified in the 2018 National Health Statistics Report of the Center for Disease Control and Prevention17 (ICD-9 codes: E950.0-E959; ICD-10 codes: X71.0xx-X83.8xx, T36.2-T71.232, T14.91). An outcome was considered present if, in the 365 days following hospital discharge: (1) the individual was medically rehospitalized, and (2) the rehospitalization encounter was associated with a diagnostic code for suicide attempt or self-harm.

Predictor Selection

Sociodemographic data, medications, health care utilization, and diagnostic codes were extracted. A complete list of predictors included in each category is provided in Supplemental Digital Content Table 1, Supplemental Digital Content 1 ( Psychiatric and medical comorbidities were determined by ICD-9/10 codes. Medical comorbidities were classified by the Elixhauser comorbidity system18 which was condensed into a single numeric score, the van Walraven score, estimating global disease burden.19 As individual Elixhauser comorbidity categories are independently associated with outcomes such as length of stay and mortality,18 we included both the presence of each Elixhauser comorbidity category and the summary van Walraven score as predictors. Psychiatric conditions were condensed into the following categories: schizophrenia or schizoaffective disorder (ICD-9 295.xx; ICD-10 F20.x, F25.x), and depressive (296.20-296.36; F32-33.x), bipolar (296.00-296.89; F31.0-31.9), anxiety (ICD-9 300.0-300.3; F40-41.x), personality (301.0-301.9; F60.0-60.9), substance use disorders (291.xx, 292.xx, 303.xx, 305.xx; F10-19.x), and pregnancy-related mental disorders including those complicating pregnancy, childbirth, the puerperium, and postpartum (648.40-44; F53.x, O99.34x, O90.6). Homelessness (Z59.x), criminal justice involvement (Z65.x), poverty (Z59.x), and adult and childhood abuse and neglect (V61.xx, 995.8x, 998.x; T76.x, Z62.x, Z91.x, Z74.x, Z62.x) were also included as predictors.

Data Preparation

Static predictors were used in their extracted form (eg, natal sex, race). Utilization variables (eg, ambulatory care visits) were coded by presence or quantity within the 365 days before the index hospitalization. To capture longitudinal information 1 year before and after index hospitalization, index hospitalizations were restricted to admissions occurring between 365 days after the earliest date available and 365 days before the latest date available for each site. If multiple encounters occurred on the same date or on contiguous dates, the encounters were consolidated into a single episode of care (eg, transfer between services). ICD diagnoses were coded by presence at index hospitalization (episode diagnoses) and any historic diagnosis (eg, history of suicide attempt or self-harm) during the entire study period preceding the index hospitalization. Missing predictor values were imputed by corresponding medians of index hospitalizations for which data were available.20 A large portion of index hospitalizations were missing on medications (47.0% UCLA-xDR, 22.0% NYC-CDRN), primary/secondary diagnosis flags (69.0% UCLA-xDR, entirely in NYC-CDRN), and chief complaint (72.0% UCLA-xDR, entirely in NYC-CDRN). Ethnicity was missing in 4.7% of UCLA-xDR and 30% of NYC-CDRN. The NYC-CDRN dataset was missing data entirely on race. The remainder of included predictors were missing for <9% of index hospitalizations.

Classification Model

The classification model was implemented using several steps that informed the identification of predictive features and assessment of the predictive value of the features. Owing to their highly interpretable representations and robustness to highly complex and nonparametric data, “tree” models are used in a variety of data mining and machine learning applications.21,22 CART modeling was chosen to capture distinct clinical profiles, display interactions between features in the data, and produce a natural visualization of model results.23 Although the following steps are presented sequentially, the methods were developed iteratively and somewhat in parallel, consistent with successful applications of supervised machine learning to clinical data.24

Classification modeling was performed in the following steps: (1) CART modeling was performed using the Scikit-learn Python toolbox sklearn.tree.25 The CART model was run with equal-weighted priors to account for anticipated class imbalance in scipy notation: class_weight=“balanced.” The Gini index determined tree splits. (2) A cost-complexity–based tree pruning strategy, that is a complexity parameter (cp), optimized the trade-off between the cost of misclassification and the tree complexity. The cp is a hyperparameter used to control the size of the decision tree and select the optimal tree size. Trees were constructed for a sequence of values of cp, and the final cp was chosen to correspond to the value which yielded a prediction error 1 SE larger than the minimum estimated by cross-validation.26 (3) All analyses were conducted using 10 folds cross-validation.27 Each hospitalization was randomly assigned into 10 nonoverlapping subsets containing a similar number of cases and noncases. Nine data subsamples were used to train the classifier and the classifier was independently tested on the remaining tenth subsample. This procedure was iteratively conducted resulting in all tenths of the data used for both training and testing the algorithm, that is 10 folds. A set seed was placed to enable replicability of results (“random_state=seed”). The cross-validation algorithm was written in Python version 3.7.1. (4) Classification tree performance was measured by the area under the curve (AUC), sensitivity, specificity, accuracy, positive predictive value, negative predictive value, and F-statistic were also examined. (5) Classification models were derived separately for the UCLA-xDR and NYC-CDRN datasets. Following separate development of CART models in both datasets, the models were inspected for prevalence of outcome, selected features, and distribution of risk.


Study Cohort Characteristics

University of California, Los Angeles Integrated Clinical and Research Data Repository

Cohort descriptives are presented in Table 1 and the flowchart for study inclusion is presented in Supplemental Digital Content Figure 1, Supplemental Digital Content 2 ( The UCLA-xDR dataset included information from 77,296 general hospitalizations of individuals with serious mental illness. Of these, 8408 were hospital episodes of care for females who met study inclusion criteria. The final sample included 1628 patients. Median number of hospitalizations per patient was 3 [interquartile range (IQR): 1–6]. Mean age was 60.5 years (SD=20.0). The most common psychiatric diagnoses were psychotic disorders (51.2%), depression (45.8%), and bipolar disorder (16.0%). Median van Walraven score was 22 (SD 12–31). Median all-cause length of stay was 4 days (IQR: 2–7). Median days to all-cause hospital readmission was 39 (IQR: 14–107). The most commonly prescribed psychotropic medication were lorazepam (26.2%), trazodone (15.8%), quetiapine (12.9%), olanzapine (8.3%), haloperidol (5.8%), escitalopram (5.6%), and sertraline (5.6%). The readmission rate was 1.3% for medically serious suicide attempt or self-harm, and 3.6% for suicidal ideation.

TABLE 1 - Cohort Descriptives
Overall [n (%)]
Characteristics N=1628 N=140,848
 Non-Hispanic 1269 (77.9) 80,283 (57.0)
 Hispanic 215 (13.2) 16,901 (12.0)
 Other or unknown 76 (4.7) 42,253 (30.0)
 White 1071 (65.7)
 Black or African American 198 (12.1)
 Asian 80 (4.9)
 Native American or Alaska Native 8 (0.5)
 Native Hawaiian or other Pacific Islander 2 (0.1)
 Other or unknown 269 (16.5)
 18–39 279 (17.1) 37,324 (26.5)
 40–64 583 (35.8) 60,282 (42.8)
 ≥65 764 (46.9) 43,240 (30.7)
Psychiatric diagnoses
 Psychotic disorder* 834 (51.2) 34,788 (24.7)
 Depression 745 (45.8) 61,691 (43.8)
 Bipolar disorder 261 (16.0) 44,226 (31.4)
Comorbid medical conditions
 Hypertension 879 (54.0) 59,297 (42.1)
 Fluid or electrolyte disorder 665 (40.8) 21,831 (15.5)
 Cardiac arrhythmia 564 (34.6) 24,648 (17.5)
 Anemia 620 (38.1) 22,535 (16.0)
 Renal failure 327 (20.1) 20,563 (14.5)
 Chronic pulmonary disease 392 (24.1) 26,338 (18.7)
 Congestive heart failure 220 (13.5) 38,028 (27.0)
Elixhauser-Van Walraven summary score
 ≥20 738 (45.3) 38,874 (27.6)
 10–19 381 (23.4) 37,043 (26.2)
 0–9 508 (31.2) 59,297 (42.1)
Substance use disorder
 Nicotine use 141 (8.6) 11,267 (8.0)
 Drug abuse 153 (9.4) 15,493 (11.0)
 Alcohol abuse 65 (4.0) 24,648 (17.5)
 Antidepressant 616 (71.3) 26,761 (24.3)
 Anxiolytic 514 (59.5) 9577 (8.7)
 Antipsychotic 391 (45.3) 13,380 (12.2)
 Mood stabilizer 90 (10.4) 12,676 (11.5)
*Corresponding to ICD codes to ICD-9 295.xx; ICD-10 F20.x, F25.x.
Corresponding to ICD-9 296.20-296.36; ICD-10 F32-33.x.
Corresponding to ICD-9 296.00-296.89, 296.40-296.89; ICD-10 F31.0-31.9.
§Information on medications was only available 53% patients in UCLA-xDR and 78% patients in NYC-CDRN. Classification tree analyses used median imputation for missing values.
Summary numeric score derived from Elixhauser comorbidity classification system corresponding to overall disease burden.
NYC-CDRN indicates New York City Clinical Data Research Network; UCLA-xDR, University of California, Los Angeles Integrated Clinical and Research Data Repository.

New York City Clinical Data Research Network

The NYC-CDRN dataset contained information from 4,363,866 general hospitalizations of individuals with serious mental illness. Of these, 841,834 were index hospitalizations for women who met study inclusion criteria. The final sample included 140,848 patients. Median number of hospitalizations per patient was 5.5 (IQR 1–9). Mean age of participants was 57.5 years (SD=11.49). The most common psychiatric diagnoses were depression (43.8%), bipolar disorder (31.4%), and psychosis (24.7%). Median van Walraven score was 19.5 (SD 10–29). Median all-cause length of stay was 5.5 days (IQR 3–8). Median days to all-cause hospital readmission was 34 (IQR 7–145). The most commonly prescribed medication were aripiprazole (28.6%), trazodone (14.5%), olanzapine (12.3%), clozapine (10.3%), cariprazine (9.3%), and escitalopram (6.7%). The readmission rate was 4.8% for medically serious suicide attempt or self-harm, and 2.6% for suicidal ideation.

CART Modeling

When applied to the UCLA-xDR dataset, the classification tree risk model identified 73% (80/109) rehospitalizations for self-directed violence with AUC 0.73, sensitivity 73.4, specificity 84.1, and accuracy 0.84. When applied to the NYC-CDRN dataset, the model identified 67% 40,408) rehospitalizations for self-directed violence with AUC 0.71, (29,619/sensitivity 83.3, specificity 82.2, and accuracy 0.84. The classification tree presenting common risk pathways to both datasets is displayed in Figure 1. Derivation of this aggregate tree involved categorization of branch points. For example, the highest branch point of the CDRN tree was Elixhauser category diagnoses ≥3 versus <3 whereas the highest branch point of the UCLA tree was Elixhauser category diagnoses ≥4 or <4. In the aggregate tree, these categories are displayed as “moderate-to-high antecedent medical comorbidity” and “low antecedent medical comorbidity.” Full classification trees with cohort counts at each leaf are included in Supplemental Digital Content Figures 2, Supplemental Digital Content 3 ( and 3, Supplemental Digital Content 4 ( (UCLA-xDR and NYC-CDRN, respectively) (Table 2).

Aggregate decision tree. Classification tree stratifying risk of suicide attempt and self-harm following medical hospitalization among women with serious mental illness. This tree displays common combinations of risk factors identified in both the University of California, Los Angeles Integrated Clinical and Research Data Repository and the New York City Clinical Data Research Network datasets. Full classification trees are presented in the Supplemental Digital Content 2 ( and 3 ( Percent risk refers to percentage of hospitalizations followed by rehospitalization for suicide attempt and self-harm within 1 year. Each pathway from root to leaf node is translated into a series of “if-then” rules that are applied to classify observations. Every leaf node is associated with a decision rule, corresponding to the most frequent class label of the observations belonging to that node. Elixhauser category diagnoses refers to the number of disease category conditions.
TABLE 2 - Classification Tree Performance by Study Site
Contingency Matrix (No. Hospitalizations)
Study Site No. Nodes* Sensitivity Specificity Accuracy AUC True Positives False Negatives True Negatives False Positives
UCLA-xDR 12 73.4 84.1 0.84 0.73 80 29 6979 1317
NYC-CDRN 15 83.3 82.2 0.84 0.71 29,619 10,789 608,645 131,831
*Number of nodes in classification tree was determined by complexity parameter.
AUC indicates area under the curve; NYC-CDRN, New York City Clinical Data Research Network; UCLA-xDR indicates University of California, Los Angeles Integrated Clinical and Research Data Repository.

In the UCLA-xDR dataset, the highest risk group comprised women with moderate medical comorbidity (≥4 Elixhauser category diagnoses), no history of suicide attempt or self-harm, but with history of pregnancy-related mental illness (10/58 hospitalizations, 17.2%). The second highest risk subgroup included women with moderate medical comorbidity (≥4 Elixhauser category diagnoses), and history of suicide attempt or self-injury (17/167 hospitalizations, 10.2%).

In the NYC-CDRN dataset, the highest risk group (defined as risk of rehospitalization for suicide attempt or self-harm) comprised women with moderate medical comorbidity (≥3 Elixhauser category diagnoses), no history of suicide attempt or self-harm, no history of pregnancy-related mental illness, but with 5 or more hospitalizations in the past year, and a history of substance abuse (3591/21,635, 16.6%). The second highest risk subgroup included women with moderate medical comorbidity (≥3 Elixhauser category diagnoses), no history of suicide attempt or self-harm, no history of pregnancy-related mental illness, but with 5 or more hospitalizations in the past year, without a history of substance use, but with a history of depression (8297/53,877, 15.4%).

The trees were notable for consistent identification of (29,619/high branch nodes, that is nodes with high importance in differentiating risk of outcome. Several pathways to risk were common between trees. Women with low antecedent medical comorbidity,18 that is with Elixhauser category diagnoses <3 (NYC-CDRN) or 4 (UCLA-xDR), and younger than 55 years old, were at increased risk (UCLA-xDR 45/651, 6.9% and NYC-CDRN 3371/44,953, 7.5%). Women with moderate-to-high antecedent medical comorbidity, that is with Elixhauser category diagnoses ≥3 (NYC-CDRN) or 4 (UCLA-xDR), no history of suicide attempt or self-harm, and with history of pregnancy-related mental illness were at increased risk (UCLA-xDR 10/58, 17.2% and NYC-CDRN 4092/59,307, 6.9%).

(29,619/Of note, tree results diverged for women with moderate-to-high medical comorbidity, that is with Elixhauser category diagnoses <3 (NYC-CDRN) or 4 (UCLA-xDR), and history of suicide attempt. This group had increased risk in the UCLA-xDR cohort (17/167, 10.2% vs. overall population risk of 1.3%) compared with the NYC-CDRN cohort (5969/138,818, 4.3% vs. overall population risk of 4.8%).


In this study, we developed a predictive model of medically serious suicide attempt and self-harm following general hospitalization among women with serious mental illness. We used a machine learning approach to identify key predictors and combinations of predictors differentiating hospitalizations followed by a suicide-related readmission from hospitalizations followed by a nonsuicide-related readmission. By applying this approach in 2 separate populations spanning diverse demographics, case mixes, geographies, and health systems, we derived an aggregate model highlighting common risk groups.

The model identified index hospitalizations at high risk for suicide-related readmission (accuracy 0.84) when applied to a moderately sized population from a single institution (8408 hospitalizations) and when subsequently applied to a multi-institution data network 2 orders of magnitude larger (841,834 hospitalizations). The most important predictors of suicide-related readmission were antecedent medical illness, history of suicide-related behavior, age, and history of pregnancy-related mental illness. Notably, the classification trees demonstrated consistent patterns across datasets, replicating common predictor combinations characterizing high-risk hospitalizations. The model performed comparably to other predictive models of suicide attempts based on EHR data (AUC 0.71–0.84)28–30 and similarly to other published EHR-based models of clinical prediction (0.83), hospitalization (0.71), and service utilization (0.71).31

Our approach suggests that risk of suicide-related behavior may be best characterized by combinations of predictors, rather than single linear relationships between individual predictors and outcome. Antecedent medical illness (ie, degree of medical comorbidity before index hospitalization) was the most important risk factor in both datasets. Presence of antecedent medical illness alone did not differentiate risk, but rather determined which combinations of predictors were relevant in differentiating risk. For example, women with moderate-to-high antecedent medical illness were at elevated risk if they experienced prior suicide-related behavior or pregnancy-related mental illness, whereas women with low or no antecedent medical illness were at elevated risk if they were younger than 55 years old. This finding affirms recent studies emphasizing the importance of considering multiple interacting and interdependent predictors when modeling suicidality risk.32,33

Our study adds to the literature by providing the first model of suicide-related behavior focusing on women with concomitant medical and mental illness. Because women experience different patterns of suicide-related behavior compared with men,34,35 suicide screening and prevention strategies during medical hospitalization may be advanced by the assessment of sex-specific risk factors. Recent work suggests men, relative to women, may be more vulnerable to suicidality following physical illness.32 However, in general there has been a paucity of information on women’s risk for suicide after acute medical illness. Current suicide screening protocols generally do not assess sex-specific predictors,36 such as history of pregnancy-related mental illness.

We found that history of pregnancy-related mental illness was associated with increased risk of suicide-related behavior after (nonobstetric) medical hospitalization, particularly among women with moderate-to-high antecedent medical comorbidity. Substantial work has focused on characterizing suicide-related behavior during pregnancy and the peripartum, with the most evidence for rise in suicide risk during the postpartum period.12,37,38 Vulnerability to sex hormone shifts has been posited as a mechanism for the enduring predisposition to mental illness following postpartum depression, particularly during menopause.39 Our finding that a history of pregnancy-related mental illness is associated with suicidality after medical hospitalization in a large sample of predominantly postmenopausal women (mean age 57.5–60 y) may suggest additional risk mechanisms. For example, women who have had a history of pregnancy-related mental illness may be particularly susceptible to stressors associated with acute medical illness and medical intervention, including loss of identity, threats to autonomy, and health/illness transitions. Moreover, women who experienced psychological trauma associated with hospitalization for childbirth may retain vulnerability to trauma reminders during subsequent hospitalizations.40 The relationship between pregnancy-related mental illness and subsequent risk of suicidality after medical illness warrants further exploration, particularly with regard to the role of hormone replacement therapy, neuroinflammatory markers, and intergenerational role transitions.

The results of our study should be interpreted in light of the following considerations. We focused on medical hospitalizations and thus did not include individuals with low-lethality suicide-related behavior. Although we used data collected from multiple institutions, care outside of our health systems was not captured. To address this “open” system problem and model a known outcome, we subsetted our data to focus on women who were rehospitalized within our health systems. As with all studies using EHR data, these data are imperfect. Inclusion of suicide risk assessment (such as the Columbia Suicide Screening Rating Scale41) would almost assuredly enhance the predictive accuracy of our model, however encoding of these scales was poor in our datasets. Suicidal ideation and behavior are notoriously undercoded, and thus the rate of suicide-related behaviors is likely underestimated.17,42 Future studies should consider use of clinical text and natural language processing to enhance cohort identification of patients presenting for suicide-related care. Our analyses focused on natal sex, future work should explore postmedical hospitalization suicide risk in cohorts with patient-identified sex.43


Suicide and self-harm are leading causes of health care costs, disability, and death7 and medical hospitalization is a unique point of crisis and potential intervention.4,11,44 Overall, the results support our hypotheses that EHR data routinely collected in the course of medical care during general hospitalization can differentiate subsequent risk of suicide-related behavior among women, and that common risk subgroups are consistently identifiable in diverse populations. Approximation of suicide risk helps to allocate resources and direct referrals to psychiatric care. Although national guidelines recommend screening all individuals treated for behavioral health conditions, patients treated for nonbehavioral medical illness remain absent from this mandate,45 and sex-specific risk factors are not yet incorporated into most suicide screening measures.36 In light of rising suicide rates despite decades of prevention efforts, analytics using large-scale EHR databases has gained traction as an evolving method for risk stratification. These precision methods hold promise for identifying high-risk populations, in turn enhancing clinical decision making for triage of resources and directed efforts toward suicide prevention for all sexes.


1. Centers for Disease Control and Prevention. Suicide rising across the US. CDC Vital Signs. 2018. Available at: Accessed November 17, 2020.
2. Hogan MF, Grumet JG. Suicide prevention: an emerging priority for health care. Health Aff (Millwood). 2016;35:1084–1090.
3. The Joint Commission. National Patient Safety Goal for Suicide Prevention. R3 Report: Requirement, Rationale, Reference; 2019:1–5.
4. McCoy TH, Castro VM, Roberson AM, et al. Improving prediction of suicide and accidental death after discharge from general hospitals with natural language processing. JAMA Psychiatry. 2016;73:1064–1071.
5. Skogman K, Alsén M, Öjehagen A. Sex differences in risk factors for suicide after attempted suicide—a follow-up study of 1052 suicide attempters. Soc Psychiatry Psychiatr Epidemiol. 2004;39:113–120.
6. Dombrovski AY, Szanto K, Duberstein P, et al. Sex differences in correlates of suicide attempt lethality in late life. Am J Geriatr Psychiatry. 2008;16:905–913.
7. Hedegaard H, Curtin SC, Warner M. Suicide Mortality in the United States continue to increase. NCHS Data Brief. 2018:1–8.
8. Spiller HA, Ackerman JP, Spiller NE, et al. Sex- and age-specific increases in suicide attempts by self-poisoning in the United States among youth and young adults from 2000 to 2018. J Pediatr. 2019;210:201–208.
9. Stone D, Holland K, Bartholow B, et al. National Center for Injury Prevention and Control, Centers for Disease Control and Prevention. Preventing Suicide : A Technical Package of Policy, Programs, and Practices. 2017:1–62. Available at: Accessed November 17, 2020.
10. Dembling BP, Chen DT, Vachon L. Life expectancy and causes of death in a population treated for serious mental illness. Psychiatr Serv. 1999;50:1036–1042.
11. Qin P, Webb R, Kapur N, et al. Hospitalization for physical illness and risk of subsequent suicide: a population study. J Intern Med. 2013;273:48–58.
12. Lindahl V, Pearson JL, Colpe L. Prevalence of suicidality during pregnancy and the postpartum. Arch Womens Ment Health. 2005;8:77–87.
13. Oates M. Suicide: the leading cause of maternal death. Br J Psychiatry. 2003;183:279–281.
14. Dougall N, Lambert P, Maxwell M, et al. Deaths by suicide and their relationship with general and psychiatric hospital discharge: 30-year record linkage study. Br J Psychiatry. 2014;204:267–273.
15. Edgcomb JB, Shaddox T, Hellemann G, et al. Electronic health record-based prediction of readmission for suicidal behavior after general hospitalization of adults with serious mental illness. J Psychiatr Res. 2020. [In press].
16. Kaushal R, Hripcsak G, Ascheim DD, et al. Changing the research landscape: the New York City clinical data research network. J Am Med Inform Assoc. 2014;21:587–590.
17. Hedegaard H, Schoenbaum M, Claassen C, et al. Issues in Developing a Surveillance Case Definition for Nonfatal Suicide Attempt and Intentional Self-Harm Using International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM) Coded Data. Hyattsville, MD: US Department of Health and Human Services; 2018.
18. Elixhauser A, Steiner C, Harris DR, et al. Comorbidity measures for use with administrative data. Med Care. 1998;36:8–27.
19. Van Walraven C, Austin PC, Jennings A, et al. A modification of the elixhauser comorbidity measures into a point system for hospital death using administrative data. Med Care. 2009;47:626–633.
20. Acuña E, Rodriguez CBanks D. The treatment of missing values and its effect on classifier accuracy. Classification, Clustering, and Data Mining Applications. Springer-Verlag, Berlin, Heidelberg; 2004:639–648.
21. Stel VS, Pluijm SMF, Deeg DJH, et al. A classification tree for predicting recurrent falling in community-dwelling older persons. J Am Geriatr Soc. 2003;51:1356–1364.
22. Fischer C, Luauté J, Némoz C, et al. Improved prediction of awakening or nonawakening from severe anoxic coma using tree-based classification analysis. Crit Care Med. 2006;34:1520–1524.
23. Ranganathan S, Nakai K, Schonbach C. Encyclopedia of Bioinformatics and Computational Biology. Amsterdam, the Netherlands: Elsevier; 2019.
24. Galatzer-Levy IR, Karstoft KI, Statnikov A, et al. Quantitative forecasting of PTSD from early trauma responses: a machine learning application. J Psychiatr Res. 2014;59:68–76.
25. Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: machine learning in Python. J Mach Learn Res. 2011;12:2825–2830.
26. Therneau T, Atkinson B, Ripley B. Rpart: recursive partitioning and regression trees. R Package Version 4.1-15. 2019. Available at: Accessed November 17, 2020.
27. Stone M. Cross-validatory choice and assessment of statistical predictions. J R Stat Soc Ser B. 1974;36:111.
28. Simon GE, Johnson E, Lawrence JM, et al. Predicting suicide attempts and suicide deaths following outpatient visits using electronic health records. Am J Psychiatry. 2018;175:951–960.
29. Walsh CG, Ribeiro JD, Franklin JC. Predicting suicide attempts in adolescents with longitudinal clinical data and machine learning. J Child Psychol Psychiatry. 2018;59:1261–1270.
30. Karmakar C, Luo W, Tran T, et al. Predicting risk of suicide attempt using history of physical illnesses from electronic medical records. JMIR Ment Heal. 2016;3:e19.
31. Goldstein BA, Navar AM, Pencina MJ, et al. Opportunities and challenges in developing risk prediction models with electronic health records data: a systematic review. J Am Med Inform Assoc. 2017;24:198–208.
32. Gradus JL, Rosellini AJ, Horváth-Puhó E, et al. Prediction of sex-specific suicide risk using machine learning and single-payer health care registry data from Denmark. JAMA Psychiatry. 2020;77:25–34.
33. Kessler RC, Hwang I, Hoffmire CA, et al. Developing a practical suicide risk prediction model for targeting high-risk patients in the Veterans health Administration. Int J Methods Psychiatr Res. 2017;26:e1575.
34. Guseva Canu I, Bovio N, Mediouni Z, et al. Suicide mortality follow-up of the Swiss National Cohort (1990–2014): sex-specific risk estimates by occupational socio-economic group in working-age population. Soc Psychiatry Psychiatr Epidemiol. 2019;54:1483–1495.
35. Chung RY, Yip BHK, Chan SSM, et al. Cohort effects of suicide mortality are sex specific in the rapidly developed Hong Kong Chinese population, 1976–2010. Depress Anxiety. 2016;33:558–566.
36. King CA, Horwitz A, Czyz E, et al. Suicide risk screening in healthcare settings: identifying males and females at risk. J Clin Psychol Med Settings. 2017;24:8–20.
37. Mota NP, Chartier M, Ekuma O, et al. Mental disorders and suicide attempts in the pregnancy and postpartum periods compared with non-pregnancy: a population-based study. Can J Psychiatry. 2019;64:482–491.
38. Palladino CL, Singh V, Campbell J, et al. Homicide and suicide during the perinatal period: findings from the national violent death reporting system. Obstet Gynecol. 2011;118:1056–1063.
39. Deecher D, Andree TH, Sloan D, et al. From menarche to menopause: exploring the underlying biology of depression in women experiencing hormonal changes. Psychoneuroendocrinology. 2008;33:3–17.
40. White T, Matthey S, Boyd K, et al. Postnatal depression and post-traumatic stress after childbirth: prevalence, course and co-occurrence. J Reprod Infant Psychol. 2006;24:107–120.
42. Rhodes AE, Links PS, Streiner DL, et al. Do hospital E-codes consistently capture suicidal behaviour? Chronic Dis Can. 2002;23:139.
43. Deutsch MB, Buchholz D. Electronic health records and transgender patients—practical recommendations for the collection of gender identity data. J Gen Intern Med. 2015;30:843–847.
41. Posner K, Brown GK, Stanley B, et al. The Columbia-suicide severity rating scale: initial validity and internal consistency findings from three multisite studies with adolescents and adults. Am J Psychiatry. 2011;168:1266–1277.
44. Ballard ED, Cwik M, Storr CL, et al. Recent medical service utilization and health conditions associated with a history of suicide attempts. Gen Hosp Psychiatry. 2014;36:437–441.
45. Grumet JG, Goldstein J, Hogan MF, et al. Compliance standards pave the way for reducing suicide in health care systems. J Heal Care Compliance. 2019;21:17–26.

suicide attempt; self-harm; women; mental health; hospitalization; serious mental illness; machine learning

Supplemental Digital Content

Copyright © 2021 Wolters Kluwer Health, Inc. All rights reserved.