Introduction
Common in hospitalized patients, AKI has experienced an increased incidence and confers an elevated risk of morbidity and mortality in both the short and long term (1–8). Despite the medical community’s increased recognition of the importance of AKI, clinicians often fail to recognize its presence in a timely manner and thus are unable to execute potentially useful actions that may ameliorate the injury (9,10). Against this backdrop, electronic alerts for AKI have proliferated, with multiple hospital systems in the United States and the entire National Health Service in the United Kingdom adopting automated provider-alerting for AKI (11).
Several observational studies suggested that AKI alerts may favorably modify the course of AKI (12–14), but our randomized trial evaluating AKI alerts failed to demonstrate that alerts altered progression of AKI, change in creatinine, rates of dialysis, or death (15). Explanations for the trial’s failure to modify clinical outcomes include inadequate alerting (i.e., the alert was sent once, per provider, per patient and required no formal acknowledgment) as well as inappropriate targeting. Broad targeting of the alert to all patients with AKI may have led to inclusion of many patients for whom alerts would not be beneficial, diluting any potential benefit in certain subgroups.
Alerting is a double-edged sword, as the proliferation of alerts leads to “alert fatigue,” which can cause providers to ignore even important alerts (16–19). In addition, alerts have an often-unrecognized potential to harm patients through increased diagnostic testing, interventions, or length of stay. Strategic targeting of alerts to patients who might most benefit from them would improve outcomes and reduce alert fatigue in providers. To date, no systematic approach to defining AKI alert targets has been proposed.
Uplift modeling describes methods to identify differential benefit from an intervention in a population of individuals (20). Pioneered in the marketing world, uplift models are used to target marketing interventions such as personalized online advertising experiences and coupons to particular people to affect their propensity of buying a product (21). However, the technique is agnostic to the type of intervention or population at hand and thus can be safely appropriated for medical interventions over patient populations. The contrast to conventional modeling should be emphasized: prognostic models seek to predict which individuals will experience a particular outcome, but uplift models predict who will benefit from a specific intervention.
We hypothesized that uplift models would better stratify patients with regard to alert benefit than a traditional prognostic model. The successful application of uplift modeling in this framework would demonstrate how alert targeting can be feasibly and reliably accomplished and advance the science of precision medicine.
Materials and Methods
Study Population
We used data from our previously published randomized trial of AKI alerts (Clinicaltrials.gov identifier NCT01862419) (15,22). Briefly, adults admitted to the Hospital of the University of Pennsylvania who developed at least stage 1 AKI as defined by the Kidney Disease: Improving Global Outcomes (KDIGO) creatinine criteria from September 17, 2013 to April 14, 2014 were identified by an automated algorithm and randomized 1:1 to usual care or AKI alerts. Patients with an initial hospital creatinine >4.0 mg/dl, who were receiving dialysis, or were admitted to a hospice service were excluded. Alerts were sent once per patient to the primary provider (typically an intern or nurse practitioner) and the unit-based pharmacist via text message. The alert informed the provider of the presence of AKI and recommended he or she take “appropriate diagnostic or therapeutic actions” but did not provide tailored recommendations. The primary outcome of the initial study was a composite of relative change in creatinine, dialysis, and death within 7 days after randomization. Neither the composite outcome nor any component thereof differed significantly between the alert and usual care arms of the study. For this study, we excluded the 115 patients with KDIGO stage 2 or 3 AKI at randomization because these individuals are likely to have AKI recognized by providers because of the more dramatic change in creatinine. The original trial was approved by the Institutional Review Board of the University of Pennsylvania and operated under a waiver of informed consent. The current analysis utilized deidentified data and was deemed exempt from informed consent by the Yale Institutional Review Board.
The data from the study was split temporally into a 70% training set (within which all models were created) and a 30% test set. This temporal split was chosen to demonstrate how uplift modeling may be used in a clinical trial setting (i.e., a secondary trial could be attempted after uplift analysis is performed on the original trial). A temporal split is a form of external validation and is generally considered superior to a random training/test split (23).
Study Covariates
All study covariates were extracted from the electronic medical record (Supplemental Table 1). We captured patient and hospitalization characteristics, laboratory values, medications, procedures, and comorbidities (as defined by International Classification of Disease, Ninth Revision codes) (24). Time-varying covariates were set to the value measured most recently before randomization during the hospitalization. For missing values, we imputed the training set mean for continuous variables, and mode for categorical variables. We chose this simple imputation strategy as it would facilitate easy translation of an effective model into clinical practice via an electronic health record system.
Primary Outcome
The primary outcome was the maximum relative change in creatinine from time of randomization to 3 days postrandomization (ΔCr3). We chose this outcome because continuous metrics increase the power to detect significant differences in alert efficacy and interaction tests are often underpowered (25,26). We used a last-observation carried forward technique to account for missing data. Roughly one quarter of randomized patients were discharged before 3 days after randomization (27.7%). The alert creatinine was the last creatinine measured in 194 (8.5%) patients.
Uplift Modeling
Uplift modeling seeks to estimate the differential benefit of an intervention versus a control on the individual patient level, on the basis of patient covariates. To date, few studies have rigorously evaluated uplift models using medical data, but those that exist have shown promising results (20). As there is no currently accepted standard algorithm for uplift modeling, we sought to evaluate three approaches in this study (T-Learner, Z-Learner, and X-Learner), which are mathematically operationalized in Supplemental Box 1 and described in the Supplemental Material. In our framework, an individual’s uplift score is a continuous metric that reflects the predicted difference in ΔCr3 under the alert versus control condition. In a negative trial, one would thus expect the mean of the uplift scores to be close to 0.
In addition to the uplift models, we developed a purely prognostic model using linear regression, trained to predict ΔCr3 (Supplemental Material). Although not an uplift model, it is conceivable that alert efficacy may be higher in those at high risk of worsening creatinine. Prognostic models are frequently used in clinical practice to target therapies (e.g., the Congestive heart failure, Hypertension, Age, Diabetes, Stroke, Vascular disease, Age, Sex Category (CHA2DS2-VASc) score in atrial fibrillation) (27).
Assessing Model Performance
We define the overall performance of the model by the strength of the interaction between the uplift score and assignment on ΔCr3, as characterized by the t-statistic for that interaction in a linear regression model. A statistically significant interaction term (at a P value threshold of <0.05) implies that the uplift construct significantly modifies the effect of the alert in terms of the subsequent change in creatinine (i.e., targeting higher scores increases the beneficial effect of alerts). In addition, we dichotomized the uplift score into two groups representing those with positive compared with those with negative or zero uplift scores, as might be done prospectively when choosing a population of individuals to target with an intervention.
Statistical Analyses
We summarized baseline characteristics and compared them between the training and test sets and between alert and usual care arms by chi-squared tests for categorical variables and Wilcoxon rank sum tests for continuous variables.
To assess the effect of randomization to alert on provider actions as uplift increased, we used logistic regression models with particular postrandomization actions (such as urinalysis) as the dependent variable and an interaction between treatment assignment and uplift score as the independent variables. All analyses assume a significant two-sided P value of <0.05 and were performed in Stata version 15.0 (College Station, TX) and Python version 3.6.1 with sci-kit learn package version 0.19.1 (28).
Sensitivity Analyses
To ensure that we were not detecting spurious signals, we created 100 “pseudointerventions”—an entirely post hoc randomly assigned binary variable—to ensure that the uplift algorithms could not find uplift where none could possibly exist. We also examined the performance of our algorithms across other, random cuts of the entire dataset (using 100 different nontemporal training/test splits). Finally, we examined model performance, excluding the 8.5% of patients who had no follow-up creatinine after the creatinine that generated the alert.
Results
Of 2278 individuals with KDIGO stage 1 AKI in the original trial, 1595 (70%) formed the training set and 683 (30%) formed the test set. Baseline characteristics of the two groups appear in Table 1, and were broadly similar, although the test set had fewer individuals with CKD and slightly fewer surgical patients. As expected, the proportion of patients randomized to alerts versus usual care was similar in both groups. Median (interquartile range [IQR]) follow-up from the time of randomization was 6.2 (2.5–12.7) days in the training set and 5.2 (2.4–9.5) days in the test set.
Table 1. -
Baseline characteristics of participants in a randomized trial of AKI alerts
Characteristics |
Training (n=1595) |
Test (n=683) |
Demographics, n (%)
|
|
|
Age, yr |
61 (16) |
61 (17) |
Men |
897 (57) |
376 (55) |
Black |
414 (26) |
197 (29) |
Admitted to a surgical service |
697 (44) |
266 (39) |
Intensive care unit |
494 (31) |
193 (28) |
Comorbidities, n (%)
|
|
|
Cerebrovascular disease |
163 (10) |
95 (14) |
Congestive heart failure |
506 (32) |
231 (34) |
Chronic obstructive pulmonary disease |
473 (30) |
209 (31) |
Diabetes mellitus |
214 (13) |
102 (15) |
Liver disease |
420 (26) |
188 (28) |
CKD |
456 (29) |
158 (23) |
Malignancy |
149 (9) |
54 (8) |
Metastatic disease |
163 (10) |
95 (14) |
Laboratory values, median (IQR)
|
|
|
Baseline creatinine, mg/dl |
0.94 (0.59–1.39) |
0.88 (0.62–1.36) |
Randomization creatinine, mg/dl |
1.38 (0.98–1.89) |
1.32 (0.99–1.86) |
BUN, mg/dl |
21 (13–33) |
20 (12–32) |
Hematocrit, % |
31 (27–35) |
31 (27–36) |
Hemoglobin, g/dl |
10.1 (8.9–11.6) |
10.3 (9.0–11.8) |
International normalized ratio |
1.2 (1.1–1.4) |
1.2 (1.1–1.4) |
Magnesium, meq/L |
2.0 (1.8–2.2) |
2.0 (1.8–2.2) |
Phosphorus, mg/dl |
3.9 (3.1–4.6) |
3.7 (3.0–4.5) |
Potassium, meq/L |
4.2 (3.8–4.6) |
4.2 (3.8–4.6) |
Sodium, meq/L |
137 (134–140) |
137 (135–140) |
Medication exposures, n (%)
|
|
|
NSAIDs |
114 (7) |
52 (8) |
Vasopressors |
225 (14) |
123 (18) |
Loop diuretic |
557 (35) |
244 (36) |
ACE inhibitor/angiotensin receptor blocker |
283 (18) |
137 (20) |
Antibiotics |
1039 (65) |
437 (64) |
Intravenous contrast |
328 (21) |
145 (21) |
Randomization status, mean (SD)
|
|
|
Assigned to alert arm |
798 (50) |
350 (51) |
Baseline characteristics of the training and test set. Data are mean (SD), n (%), or median (interquartile range [IQR]). Comorbidities were defined by administrative International Classification of Diseases, Ninth Revision codes. Medications represent any exposure during the admission but before randomization. NSAIDs, Nonsteroidal anti-inflammatory drugs, ACE, angiotensin-converting enzyme.
The median (IQR) ΔCr3 was 3.1% (0.0%–20.3%) in the training set and 4.3% (0.0%–20.0%) in the test set. There was no observed benefit of alerting on ΔCr3 in either the training (ΔCr3=–0.7% [−4.1% to 2.7%]; P=0.69) or the test set (ΔCr3=–3.1% [−8.2% to 2.0%]; P=0.24), and the estimates of alert effect did not differ significantly between the training and test sets (P for interaction =0.45).
Table 2 characterizes the performance of the uplift scores generated by the four different algorithms. The Z-Learner performed best, although similarly to the T- and X-Learners. Although the purely prognostic model was more accurate in predicting outcome, the risk estimate did not significantly modify the effect of alerting (P for interaction =0.30). On the other hand, for each 1 SD increase in uplift score as calculated using the Z-Learner, we found a 6.5% lower rate of change in creatinine in a patient randomized to alert versus usual care (P for interaction =0.01). The effect of randomizing patients with progressively higher uplift scores is visualized in Figure 1.
Table 2. -
Uplift model performance in a test set of patients in a randomized trial of AKI alerts
Uplift Model |
Association with ΔCr3 (% per SD) |
P Value |
ΔCr3 Difference (Alert versus Control), (% per SD) |
Interaction P Value |
T-Learner, alert |
1.5 (−1.5 to 4.5) |
0.32 |
−5.8 (−10.8 to −0.8) |
0.02 |
T-Learner, usual care |
7.3 (0.32–11.4) |
0.001 |
Z-Learner, alert |
1.2 (−1.7 to 4.2) |
0.41 |
−6.5 (−11.5 to −1.4) |
0.01 |
Z-Learner, usual care |
7.7 (3.5–11.9) |
<0.001 |
X-Learner, alert |
1.6 (−1.4 to 4.6) |
0.29 |
−5.8 (−10.8 to −0.7) |
0.03 |
X-Learner, usual care |
7.4 (3.3–11.5) |
<0.001 |
Prognostic, alert |
3.0 (0.1–5.9) |
0.04 |
−2.5 (−7.2 to 2.2) |
0.30 |
Prognostic, usual care |
5.5 (1.7–9.3) |
0.004 |
Performance of various uplift models and the prognostic model. Association between uplift scores and change in creatinine per one SD increase in model score shows that increases in all models are associated with worse overall outcomes in terms of ΔCr3 in the usual care arm. Difference in change in creatinine between alert and control groups demonstrate the increased efficacy of alerting as uplift scores increase (negative numbers indicate alerting is associated with improved ΔCr3). ΔCr3, maximum relative change in creatinine from time of randomization to 3 days postrandomization.
Figure 1.: As targeting becomes more stringent, the effect on change in creatinine between alert and control becomes more pronounced in the uplift models, but remains stable in the purely prognostic model. Comparisons of change in creatinine in usual care and alert groups as the targeted population narrows. Effect of targeting all patients with AKI is the center of the x-axis (100%). To the left of center, the effect of targeting those with lower uplift scores is visualized. To the right of center, the effect of targeting those with higher uplift scores is visualized. Where the red line is lower than the blue line, a benefit of alerting is present, and vice versa. Charts are truncated at 20% given sparse data beyond those points. (A) T-Learner (P for interaction =0.02); (B) Z-Learner (P for interaction =0.01); (C) X-Learner (P for interaction =0.03); (D) Prognostic targeting (P for interaction =0.30).
To bring greater clarity to the uplift score, we dichotomized high-versus-low uplift by grouping those patients with positive (n=446) versus zero or negative (n=237) uplift scores using the Z-Learner. Those in the high score group would be predicted to benefit from AKI alerts and could be chosen for specific targeting in a prospective study. Randomization status and follow-up time did not differ between these two groups. In the high uplift group, the median ΔCr3 was 5.6% (0.0%–22.6%) compared with 2.4% (0.0%–17.6%) in the low uplift group, a difference that was not statistically significant (P=0.11). In the high uplift group, alerting was associated with a significant beneficial effect on ΔCr3, with a median change of 2.6% (0.0%–17.1%) among those who received alerts and 7.9% (0.0%–27.2%) among those who received usual care (median difference, −5.3 [−9.9 to −0.06]; P=0.03). In contrast, in the low uplift group, alerts performed worse than usual care, with a median ΔCr3 of 5.3% (0.0%–20.0%) in the alert group compared with 0.0% (0.0%–11.4%) in the usual care group (median difference, +5.3 [1.6–9.0]; P=0.005). This relationship is shown graphically in Figure 2. In terms of absolute change in creatinine, among those in the high uplift group who received an alert, the creatinine increased by a median (IQR) of 0.03 (0–0.22) mg/dl compared with 0.09 (0–0.28) in the usual care group (P=0.06).
Figure 2.: Alerting results in less of change in creatinine than usual care in the group identified as likely to benefit, while the reverse is true in the group identified as unlikely to benefit. Change in creatinine from randomization to 3 days later, stratified by alert status in the group predicted unlikely to benefit from alerts versus likely to benefit from alerts on the basis of the Z-learner uplift algorithm. Bars represent median and whiskers represent IQR.
Baseline factors associated with higher uplift scores (and thus an increased likelihood of alert benefit) are displayed for the Z-Learner in Table 3 (similar tables for the X- and T-Learners and the prognostic model are included in Supplemental Figure 1 and Supplemental Tables 2A–C). Those with positive uplift scores were more likely to be female, older, and have a lower initial creatinine and alert creatinine. Equal proportions of patients in the high and low uplift groups were located in the intensive care unit, and there was no difference in randomization Sequential Organ Failure Assessment score. In contrast, the prognostic model targets (unsuccessfully) those in the intensive care unit, younger patients, and those with higher SOFA scores. More post-AKI nephrology consults occurred in the low uplift group (13.1% versus 8.1%; P=0.04) but the rates did not differ by randomization status.
Table 3. -
Characteristics predictive of alert harm or benefit among patients in a randomized trial of AKI alerts
Characteristic |
Unlikely to Benefit from Alert (n=237) |
Likely to Benefit from Alert (n=446) |
P Value |
Demographics, n (%)
|
|
|
|
Age, yr |
56 (18) |
64 (15) |
<0.001 |
Men |
165 (71) |
211 (47) |
<0.001 |
Black |
63 (27) |
134 (30) |
0.34 |
In ICU at randomization |
72 (30) |
121 (27) |
0.37 |
Surgical admission |
107 (45) |
159 (36) |
0.02 |
Comorbidities, n (%)
|
|
|
|
Cerebrovascular disease |
29 (12) |
66 (15) |
0.36 |
CKD |
78 (33) |
110 (25) |
0.02 |
Congestive heart failure |
86 (36.3) |
145 (33) |
0.32 |
Diabetes |
78 (33) |
131 (29) |
0.34 |
Liver disease |
42 (18) |
60 (14) |
0.14 |
Malignancy |
47 (20) |
111 (25) |
0.14 |
Metastatic disease |
13 (6) |
41 (9) |
0.09 |
Laboratory values, median (IQR)
|
|
|
|
Baseline creatinine, mg/dl |
1.15 (0.78–1.69) |
0.78 (0.53–1.16) |
<0.001 |
Alert creatinine, mg/dl |
1.58 (1.18–2.14) |
1.20 (0.86–1.70) |
<0.001 |
Hemoglobin, g/dl |
10.4 (9.2–12.0) |
10.2 (8.9–11.6) |
0.12 |
Phosphorus, mg/dl |
4.2 (3.4–5.1) |
3.5 (2.9–4.2) |
<0.001 |
Potassium, meq/L |
4.3 (3.9–4.7) |
4.2 (3.8–4.5) |
0.005 |
Sodium, meq/L |
138 (135–141) |
137 (134–140) |
0.008 |
Severity of illness, median (IQR)
|
|
|
|
SOFA score |
2 (1–5) |
2 (1–4) |
0.08 |
Medication exposures, n (%)
|
|
|
|
NSAIDs |
13 (5.5) |
33 (7.4) |
0.34 |
Vasopressors |
41 (17) |
82 (18) |
0.73 |
Loop diuretic |
81 (34) |
163 (37) |
0.54 |
ACE inhibitor/angiotensin receptor blocker |
40 (17) |
97 (22) |
0.13 |
Antibiotics |
142 (60) |
295 (66) |
0.11 |
Intravenous contrast |
37 (16) |
108 (24) |
0.009 |
Randomization status, n (%)
|
|
|
|
Alert |
126 (53) |
222 (50) |
0.46 |
Characteristics at randomization of patients in the test set predicted to be unlikely to benefit from alert versus likely to benefit from alert on the basis of Z-Learner score ≤0 or >0, respectively. Note that comorbidity codes and medication exposures are not used in model building and are produced here to provide a clearer clinical phenotype for readers. Data are n (%) or median (interquartile range [IQR]). ICU, intensive care unit; SOFA, Sequential Organ Failure Assessment; NSAIDs, nonsteroidal anti-inflammatory drugs; ACE, angiotensin-converting enzyme.
We examined the rate of several process measures among those with low and high uplift scores in both the alert and usual care arms of the trial (Table 4). We noted a significant interaction between alerting, uplift category, and fluid administration after AKI such that alerts appeared to reduce the rate of fluid administration in the low uplift group, but increase the rate of fluid administration in the high uplift group (P for interaction =0.005). However, there was no direct relationship between fluid administration and ΔCr3 (P=0.69).
Table 4. -
Postrandomization interventions in patients in a randomized trial of AKI alerts
Practice |
Low Uplift (n=237) |
High Uplift (n=446) |
Alert (n=126) |
Usual Care (n=111) |
P Value |
Alert (n=224) |
Usual Care (n=222) |
P Value |
Interaction P Value |
Fluid administration, 24 h |
39 (31) |
51 (46) |
0.02 |
91 (41) |
75 (34) |
0.14 |
0.005 |
Fluid bolus, 24 h |
22 (18) |
30 (27) |
0.08 |
48 (21) |
39 (18) |
0.30 |
0.04 |
Urinalysis, 24 h |
59 (47) |
44 (40) |
0.27 |
84 (38) |
88 (40) |
0.64 |
0.24 |
Ultrasound, 24 h |
11 (9) |
18 (16) |
0.08 |
29 (13) |
24 (11) |
0.49 |
0.07 |
Contrast, 24 h |
27 (21) |
16 (14) |
0.16 |
41 (18) |
38 (17) |
0.74 |
0.35 |
Nephrology consult, inpatient |
19 (15) |
12 (11) |
0.33 |
18 (8) |
18 (8) |
0.98 |
0.46 |
NSAID use |
5 (4) |
6 (5) |
0.60 |
13 (6) |
17 (8) |
0.43 |
0.97 |
ACE/ARB use |
29 (23) |
23 (21) |
0.67 |
52 (23) |
58 (26) |
0.48 |
0.45 |
Aminoglycoside use |
7 (6) |
6 (5) |
0.96 |
10 (5) |
13 (6) |
0.51 |
0.66 |
Postrandomization interventions, stratified by uplift category, where patients with high uplift benefit more from alerting for AKI. Numbers are n (percent). P values reflect differences in proportion of patients who experienced that intervention within the given uplift category. Interaction P values indicate the difference in the effect of the intervention on these practices across the uplift categories. NSAIDs, nonsteroidal anti-inflammatory drugs; ACE, angiotensin-converting enzyme; ARB, angiotensin receptor blocker.
Sensitivity Analyses
Sensitivity analyses confirmed the primary results and are available in the Supplemental Material (Supplemental Table 3).
Discussion
In this analysis of a randomized trial of AKI alerts, we found that uplift modeling could be used to identify a population of individuals who would disproportionately benefit from alerts in terms of change in creatinine. The key insight is that a purely prognostic targeting method (sending AKI alerts to those at high risk of worsening creatinine) would not have improved the effect of alerts in our trial. However, uplift techniques that model individual treatment effect were successful. Techniques to individually target therapies are sorely needed, as they have the potential to contain costs and limit toxicities of interventions while maintaining population benefit.
Broadly speaking, we set out to discover what types of patients benefit from electronic alerts for AKI. The three uplift models all shared common features in this regard: female sex, older age, and lower baseline and alert creatinine. Older age and female sex is associated with lower muscle mass, and thus lower creatinine generation (29). This may lead to a slower rise in creatinine in the setting of AKI. Thus, these features may describe a population of patients in whom AKI may be missed clinically (30). Supporting this hypothesis is the fact that those in the high uplift group were less likely to receive a nephrology consult, despite having worse kidney outcomes.
Patients with higher uplift scores were more likely to receive a fluid bolus in response to the AKI alert than patients with lower uplift scores. But fluid administration itself was not associated with better outcomes in either group, suggesting that dilution of creatinine was not the mechanism of benefit here. We think it unlikely that the uplift algorithms are thus identifying individuals with fluid-responsive AKI, but rather a population of individuals where conventional management of AKI (which includes fluid administration but many other supportive treatments) is especially beneficial because AKI might not have been recognized. Future studies with careful analysis of postalert provider practices and greater stratification of AKI phenotype may be necessary to determine which practices, in which populations, associate with benefit or harm in AKI.
Electronic alerts and clinical decision support systems have proliferated rapidly in recent years, often without rigorous efficacy testing (31,32). Although alerting for AKI may appear to be common sense, the burden of alert fatigue has increasing potential to hinder patient care and diminish the effect of alerts (33). The need for better alert targeting is clear. Should robust methods be developed to target alerts more specifically to those who might benefit, the total number of alerts will be reduced while both improving alert efficacy and reinvigorating provider trust in alert systems. As uplift modeling becomes refined in the health care setting, institutions may find that the implementation of uplift-targeted interventions reduces costs, increases provider satisfaction, and improves patient care.
That certain interventions have different effects in different populations is by no means surprising. The use of selective estrogen receptor modulators in hormone receptor-positive breast cancer and the avoidance of certain antiplatelet agents among carriers of key genetic variants are classic examples of personalized medicine (34,35). Hypothesis-driven subgroup analysis may identify significant treatment heterogeneity. Although such an approach was not performed in this research, our prespecified subgroup analyses reported in the original trial article did not identify a group that would benefit from alerting. Uplift models can integrate a large multitude of data-points and covariates into a prediction of benefit in an unbiased manner. Although the approach has been described theoretically (20), ours is the first study to demonstrate its feasibility in a temporal data split, mirroring how such a model could be used in the context of an ongoing clinical trial. One could imagine a trial enrolling a number of patients, applying uplift algorithms, and then subsequently refining enrollment to target those more likely to benefit in an adaptive design.
This study is strengthened by the temporal split of training and test data, which mirrors how a prospective study of similar design would proceed. This nonrandom split explains the differences in baseline characteristics between the training and test datasets. Modest shifts in patient demographics and disease severity are not unexpected across the 8-month time frame of the original trial. We view these discrepancies as a significant strength because it demonstrates the robustness of uplift modeling to variation in baseline population characteristics during the enrollment period of the trial.
The limitations of this study include its retrospective nature. In addition, post hoc subgroups identified as benefitting from an intervention in an initial study may not be shown to benefit in dedicated studies (36). However, our use of a held-out test set and 100-fold crossvalidation to validate the performance of the intervention in a subgroup is less biased than post hoc subgroup analyses conducted on entire datasets. Additionally, this model may be specific to the alert and healthcare system in which it was developed, and we cannot be sure that this model would successfully target other types of AKI alerts in different locales. Additionally, we did not model provider characteristics (such as length of practice or specialization), which may have important bearing on alert success. Future studies could explicitly include these variables to address the targeting of alerts toward providers who might most benefit from receiving them. We used single imputation to account for missing variables, as that will be necessary in a prospective validation of this approach. Our primary outcome, focused on change in creatinine, is not as clinically meaningful as the rate of dialysis or death, but a continuous outcome was necessary to provide adequate statistical power for the interaction tests. Further, although we found a statistically significant effect of alerting on the change in creatinine, the absolute effect was quite small and may not translate to hard clinical outcomes. Uplift modeling on larger trial datasets may be able to predict the benefit of an intervention on hard clinical outcomes. Finally, our original trial may have been negative because of the lack of direct instructions to those who received alerts regarding AKI best practices, such as drug-dosage adjustment, optimization of hemodynamics, and avoidance of nephrotoxins. Different alert characteristics would be expected to drive different uplift scores.
Our approach highlights the feasibility of “learning” alerts. If an alert, for AKI or any other condition, is developed and assessed in a randomized fashion, uplift modeling can be used to subsequently refine alert targeting in the environment where the intervention was developed. Although we chose to develop our uplift model in 70% of the data and validate it in 30%, continuously updatable models, which integrate information as the study progresses exist, might be useful for dynamic alert targeting systems. To demonstrate the clinical utility of uplift modeling in a prospective fashion, two clinical trials would be performed: one with broad targeting of alerts to all patients with AKI, and a subsequent trial targeting those with high uplift scores. If this technique is to enter the mainstream of clinical medicine, one would want to show that the effect of alerting in the second trial was significantly stronger than in the first.
Through a novel application of uplift modeling, we were able to identify patients most likely to benefit from an alert for AKI. Such a technique is an example of personalized medicine and may be useful to target interventions to those most likely to benefit in future studies.
Disclosures
F.P.W. reports coinventor status on patent 14/032,131, which relates to the real-time detection of disease status. P.M.P. reports personal fees from Novartis, personal fees from GE Healthcare, personal fees from Baxter and personal fees from HealthSpan Dx outside the scope of the submitted work. C.R.P. reports personal fees from AbbVie and personal fees from Genfit outside the scope of the submitted work.
Acknowledgments
This study was supported by grants K23 DK097201 and R01 DK113191 to F.P.W., and grant K24 DK090203 to C.R.P.
The funding source had no role in the design, collection, analysis, or interpretation of the data, the writing of the report, or the decision to submit the manuscript for publication.
References
1. Uchino S, Bellomo R, Goldsmith D, Bates S, Ronco C: An assessment of the RIFLE criteria for acute renal failure in hospitalized patients. Crit Care Med 34: 1913–1917, 2006 16715038
2. Hoste EA, Kellum JA: Acute kidney injury: Epidemiology and diagnostic criteria. Curr Opin Crit Care 12: 531–537, 2006 17077682
3. Mehta RL, Kellum JA, Shah SV, Molitoris BA, Ronco C, Warnock DG, Levin A; Acute Kidney Injury Network: Acute kidney injury network: Report of an initiative to improve outcomes in acute kidney injury. Crit Care 11: R31, 2007 17331245
4. Coca SG, King JT Jr., Rosenthal RA, Perkal MF, Parikh CR: The duration of postoperative acute kidney injury is an additional parameter predicting long-term survival in diabetic veterans. Kidney Int 78: 926–933, 2010 20686452
5. Coca SG, Singanamala S, Parikh CR: Chronic kidney disease after acute kidney injury: A systematic review and meta-analysis. Kidney Int 81: 442–448, 2012 22113526
6. Chawla LS, Amdur RL, Shaw AD, Faselis C, Palant CE, Kimmel PL: Association between AKI and long-term renal and cardiovascular outcomes in United States veterans. Clin J Am Soc Nephrol 9: 448–456, 2014 24311708
7. Coca SG, Yusuf B, Shlipak MG, Garg AX, Parikh CR: Long-term risk of mortality and other adverse outcomes after acute kidney injury: A systematic review and meta-analysis. Am J Kidney Dis 53: 961–973, 2009 19346042
8. Hsu RK, McCulloch CE, Dudley RA, Lo LJ, Hsu CY: Temporal changes in incidence of dialysis-requiring AKI. J Am Soc Nephrol 24: 37–42, 2013 23222124
9. Kellum JA, Lameire N, Aspelin P, Barsoum RS, Burdmann EA, Goldstein SL, Herzog CA, Joannidis M, Kribben A, Levey AS, MacLeod AM, Mehta RL, Murray PT, Naicker S, Opal SM, Schaefer F, Schetz M, Uchino S: Kidney disease: Improving Global Outcomes (KDIGO) acute kidney injury work group. KDIGO clinical practice guideline for acute kidney injury.
Kidney Int Suppl 2[Suppl.]: 1–138, 2012
10. Wilson FP, Bansal AD, Jasti SK, Lin JJ, Shashaty MG, Berns JS, Feldman HI, Fuchs BD: The impact of documentation of severe acute kidney injury on mortality. Clin Nephrol 80: 417–425, 2013 24075024
11. Lachance P, Villeneuve PM, Rewa OG, Wilson FP, Selby NM, Featherstone RM, Bagshaw SM: Association between e-alert implementation for detection of acute kidney injury and outcomes: A systematic review. Nephrol Dial Transplant 32: 265–272, 2017 28088774
12. Colpaert K, Hoste EA, Steurbaut K, Benoit D, Van Hoecke S, De Turck F, Decruyenaere J: Impact of real-time electronic alerting of acute kidney injury on therapeutic intervention and progression of RIFLE class. Crit Care Med 40: 1164–1170, 2012 22067631
13. Selby NM, Crowley L, Fluck RJ, McIntyre CW, Monaghan J, Lawson N, Kolhe NV: Use of electronic results reporting to diagnose and monitor AKI in hospitalized patients. Clin J Am Soc Nephrol 7: 533–540, 2012 22362062
14. Rind DM, Safran C, Phillips RS, Wang Q, Calkins DR, Delbanco TL, Bleich HL, Slack WV: Effect of computer-based alerts on the treatment and outcomes of hospitalized patients. Arch Intern Med 154: 1511–1517, 1994 8018007
15. Wilson FP, Shashaty M, Testani J, Aqeel I, Borovskiy Y, Ellenberg SS, Feldman HI, Fernandez H, Gitelman Y, Lin J, Negoianu D, Parikh CR, Reese PP, Urbani R, Fuchs B: Automated, electronic alerts for acute kidney injury: A single-blind, parallel-group, randomised controlled trial. Lancet 385: 1966–1974, 2015 25726515
16. Embi PJ, Leonard AC: Evaluating alert fatigue over time to EHR-based clinical trial alerts: Findings from a randomized controlled study. J Am Med Inform Assoc 19[e1]: e145–e148, 2012 22534081
17. Ash JS, Sittig DF, Campbell EM, Guappone KP, Dykstra RH: Some unintended consequences of clinical decision support systems. AMIA Annu Symp Proc 11: 26–30, 2007 18693791
18. Singh H, Spitzmueller C, Petersen NJ, Sawhney MK, Sittig DF: Information overload and missed test results in electronic health record-based settings. JAMA Intern Med 173: 702–704, 2013 23460235
19. Oh J, Bia JR, Ubaid-Ullah M, Testani JM, Wilson FP. Provider acceptance of an automated electronic alert for acute kidney inury. Clin Kidney J 9: 567–571, 2016
20. Jaskowski M, Jaroszewicz S: Uplift modeling for clinical trial data. Presented at the ICML 2012 Workshop on Clinical Data Analysis, Scotland, UK, June 30–July 1, 2012
21. Linoff GS, Berry MJ: Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management, Hoboken, NJ, John Wiley & Sons, 2011
22. Wilson FP, Reese PP, Shashaty MG, Ellenberg SS, Gitelman Y, Bansal AD, Urbani R, Feldman HI, Fuchs B: A trial of in-hospital, electronic alerts for acute kidney injury: Design and rationale. Clin Trials 11: 521–529, 2014 25023200
23. Justice AC, Covinsky KE, Berlin JA: Assessing the generalizability of prognostic information. Ann Intern Med 130: 515–524, 1999 10075620
24. Quan H, Sundararajan V, Halfon P, Fong A, Burnand B, Luthi JC, Saunders LD, Beck CA, Feasby TE, Ghali WA: Coding algorithms for defining comorbidities in ICD-9-CM and ICD-10 administrative data. Med Care 43: 1130–1139, 2005 16224307
25. Marshall SW: Power for tests of interaction: Effect of raising the Type I error rate. Epidemiol Perspect Innov 4: 4, 2007 17578572
26. Greenland S: Tests for interaction in epidemiologic studies: A review and a study of power. Stat Med 2: 243–251, 1983 6359318
27. Lip GY, Nieuwlaat R, Pisters R, Lane DA, Crijns HJ: Refining clinical risk stratification for predicting stroke and thromboembolism in atrial fibrillation using a novel risk factor-based approach: The euro heart survey on atrial fibrillation. Chest 137: 263–272, 2010 19762550
28. Pedragosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Burcher M, Perrot M, Duchesnay E: Scikit-learn: Machine learning in Python.
J Mach Learn Res 12: 2825–2830, 2011
29. Rule AD, Bailey KR, Schwartz GL, Khosla S, Lieske JC, Melton LJ 3rd: For estimating creatinine clearance measuring muscle mass gives better results than those based on demographics. Kidney Int 75: 1071–1078, 2009 19177154
30. Ix JH, Wassel CL, Stevens LA, Beck GJ, Froissart M, Navis G, Rodby R, Torres VE, Zhang YL, Greene T, Levey AS: Equations to estimate creatinine excretion rate: The CKD epidemiology collaboration. Clin J Am Soc Nephrol 6: 184–191, 2011 20966119
31. Hoste EA, Kashani K, Gibney N, Wilson FP, Ronco C, Goldstein SL, Kellum JA, Bagshaw SM; 15 ADQI Consensus Group: Impact of electronic-alerting of acute kidney injury: Workgroup statements from the 15(th) ADQI Consensus Conference. Can J Kidney Health Dis 3: 10, 2016 26925246
32. James MT, Hobson CE, Darmon M, Mohan S, Hudson D, Goldstein SL, Ronco C, Kellum JA, Bagshaw SM; Acute Dialysis Quality Initiative (ADQI) Consensus Group: Applications for detection of acute kidney injury using electronic medical records and clinical information systems: Workgroup statements from the 15(th) ADQI Consensus Conference. Can J Kidney Health Dis 3: 9, 2016 26925245
33. Alert fatigue leads to OR fatalities. Healthcare Benchmarks Qual Improv 18: 9–11, 2011 21265387
34. Fisher B, Costantino J, Redmond C, Poisson R, Bowman D, Couture J, Couture J, Dimitrov NV, Wolmark N, Wickerham DL, Fisher ER, Margolese R, Robidoux A, Shibata H, Terz J, Paterson AHG, Feldman MI, Farrar W, Evans J, Lickley HL, Ketner M: A randomized clinical trial evaluating tamoxifen in the treatment of patients with node-negative breast cancer who have estrogen-receptor-positive tumors. N Engl J Med 320: 479–484, 1989 2644532
35. Simon T, Verstuyft C, Mary-Krause M, Quteineh L, Drouet E, Méneveau N, Steg PG, Ferrières J, Danchin N, Becquemont L; French Registry of Acute ST-Elevation and Non-ST-Elevation Myocardial Infarction (FAST-MI) Investigators: Genetic determinants of response to clopidogrel and cardiovascular events. N Engl J Med 360: 363–375, 2009 19106083
36. Rothwell PM: Treating individuals 2. Subgroup analysis in randomised controlled trials: Importance, indications, and interpretation. Lancet 365: 176–186, 2005 15639301