- We developed a machine learning predictive model to detect patients on dialysis with a SARS-CoV-2 infection 3 days before symptom onset.
- Changes in physiologic markers were subtle independently; model appeared to detect important combinations for each patient’s prediction.
- We proposed a conceptual workflow for application of model-directed mitigation and testing within the standard practices of a provider.
The coronavirus disease 2019 (COVID-19) pandemic is challenging the world’s health care systems, including bringing complexities to the maintenance of dialysis in people with ESKD (1234–5). In the United States, most patients with ESKD are treated by outpatient hemodialysis (HD), where social distancing can be difficult and heightened infection control measures are required (e.g., temperature screenings, universal masking, isolation treatments/shifts/clinics) (1234–5). Patients with ESKD are typically older and have multiple comorbidities, placing the population at higher risk for requiring intensive care and dying if affected by COVID-19 (6891011–12).
Early reports from the United States show an 11% COVID-19 mortality in ESKD (13), which is higher than the 3% COVID-19 mortality shown in the national population (14,15). This is not unexpected, with reports from Asia and Europe suggesting a 16% to 23% COVID-19 mortality in ESKD (161718–19). Despite the high mortality rate, an impaired immune response may render patients on dialysis more frequently asymptomatic when infected by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) (16,17). In both the general and ESKD populations, the most prevalent symptoms of COVID-19 at presentation are fever (11%–66% in dialysis; 82% in the general population) and cough (37%–57% in dialysis; 62% in the general population) (16,2021–22). The less frequent occurrence of signs and symptoms indicative of COVID-19 in patients on dialysis could be making the outbreak even more challenging to manage.
Dialysis providers routinely capture patient/clinical data during care. The robust data collected during HD treatments (generally thrice weekly) provide unique opportunities to leverage artificial intelligence (AI) in predicting COVID-19 outcomes. AI modeling helped identify the onset of the outbreak in China (23,24), and is being used to help with early detection of areas and individuals in the general population at risk for COVID-19 (2526–27).
As part of a health care operations effort in response to the COVID-19 outbreak, an integrated kidney disease health care company aimed to develop a machine learning (ML) prediction model that identifies the risk of patients on HD having an undetected SARS-CoV-2 infection. We analyzed the model performance to determine the possible utility for testing in the HD population.
Materials and Methods
An integrated kidney disease health care company (Fresenius Medical Care, Waltham, MA) used retrospective real world data from its national network of dialysis clinics to develop a ML model that predicts the risk of an adult patient on HD having an undetected SARS-CoV-2 infection that is identified after the following ≥3 days.
This analysis was performed in adherence with the Declaration of Helsinki under an initial and revised protocol reviewed by the New England Independent Review Board (NEIRB). This retrospective analysis was determined to be exempt and did not require patient consent (Protocol version 1.0 NEIRB#1–17–1302368–1; Protocol revision version 1.1 NEIRB#17–1348994–1; Needham Heights, MA).
COVID-19 Mitigation and Testing Practices
The national network of dialysis clinics (Fresenius Kidney Care, Waltham, MA) started implementing modified infection control measures in late February 2020, in response to the COVID-19 outbreak in the general population. Universal mitigation efforts at the provider included screening patients and staff before entry into the dialysis facility for high body temperature, signs or symptoms of flu-like illness, exposure to others with COVID-19, or a known infection diagnosed elsewhere (28). Patients and staff were required to thoroughly wash their hands on entering and leaving the facility. Patients were provided surgical masks and were required to wear them when in any area of the facility. Staff were required to wear enhanced personal protective equipment, including masks, face shields, gowns, and gloves, when in the proximity of patients in any area. The first patients on dialysis (n=2) at the provider were identified as COVID-19 positive on March 3, 2020.
All patients and staff with an elevated body temperature or symptoms of a flu-like illness were considered under investigation, and had RT-PCR laboratory testing for SARS-CoV-2 performed at a laboratory contracted by the dialysis provider. Patients under laboratory investigation for a SARS-CoV-2 infection were treated in dedicated isolation areas (rooms, shifts, or clinics) for patients who were suspected of being infected, until confirmed negative by two RT-PCR tests that were more than 24 hours apart. Patients who had been exposed to others with COVID-19 were moved to unique isolation areas for patients who had been exposed under investigation for 14 days, and received RT-PCR testing if they presented with signs or symptoms of a flu-like illness. Patients with RT-PCR–confirmed COVID-19 were treated in dedicated isolation areas for patients who were infected until two negative RT-PCR tests more than 24 hours apart were documented.
Population and Outcome
We considered data from adult (age ≥18 years) patients on HD treated throughout the national network for development of a model to predict individuals with an undetected SARS-CoV-2 infection. The observation period started on February 27, 2020. The positive arm included data from patients who had ≥1 confirmed positive RT-PCR COVID-19 test at of the end of the observation period (September 8, 2020, n=11,166). The negative arm included data from patients who: (1) were found COVID-19 negative (n=7959), or (2) were randomly sampled from all active patients at the dialysis provider without a reported suspicion of COVID-19 as of the end of the observation period (n=21,365). The random sampling was performed using the “sample” function from the “pandas” Python package.
We defined the index date of a patient on HD having a SARS-CoV-2 infection as the date of the COVID-19–positive test. In patients who were the control with a negative COVID-19 test result, the test date was used as the index date. In controls without a test, the index date was randomly sampled from the positive patients’ index dates occurring before August 25, 2020, 2 weeks before the end of the observation period. This cutoff was chosen to minimize the possibility that patients in the control were infected, but had not displayed signs or symptoms leading to testing before the end of the observation period. We included data from patients with (1) ≥1 hemoglobin sample collected both 1–14 days and 31–60 days before the individual’s prediction date (3 days before the index date, further defined below), and (2) ≥1 HD treatment both 1–7 days and 31–60 days preceding the prediction date. This was done to ensure we included only patients who were active as hemoglobin draws are conducted weekly for in-center HD (typically thrice-weekly treatments). We excluded data from patients suspected to have COVID-19 who were pending laboratory testing, or were classified as a person under investigation where no laboratory testing was performed or documented.
AI Model Development
Software and ML Model Logic
We used Python version 3.7.7 (Python Software Foundation, Delaware) to build the ML model utilizing the XGBoost package (29). The XGBoost Python package used input variables from the training dataset to construct multiple decision trees, giving each a random sample, and established a series of thresholds that split variables to maximize the information gain. Decision trees were constructed iteratively, and new decision trees were added to predict prior errors. The decision trees made by the XGBoost ML model are inherently able to handle missing values without imputation, by including their presence when determining the splits (e.g., splitting observations with temperatures ≥98.0°F (≥36.7°C) from temperatures <98.0°F (<36.7°C), or missing temperatures). After no further improvements in performance were achieved using the validation dataset (also used for hyperparameter tuning), the ensemble of decision trees produced the final ML model that was assessed with the testing dataset.
Undetected SARS-CoV-2 Prediction Model
We used 81 a priori selected treatment/laboratory variables up to the individually defined prediction date (3 days before the index date defined above) to predict the risk of a SARS-CoV-2 infection being identified in the following ≥3 days (Figure 1). This is intended to yield individual predictions at least 3 days in advance of symptoms that warranted testing. We used a 60%:20%:20% randomized split of COVID-19–positive samples for the training, validation, and testing datasets, and added the same number of patients who were COVID-19 negative to only the training and validation datasets. The testing dataset used to evaluate final model performance had a higher number of COVID-19–negative samples added to more closely match the prevalence observed in the overall national HD population (30,31).
Descriptive statistics for patients on HD were tabulated for demographics and variables at the time of the prediction for an undetected SARS-CoV-2 infection. Data are stratified by patients on HD who did, or did not, have laboratory confirmation of COVID-19 after the date of prediction.
Analysis of ML Model Feature Importance
Shapley values (32,33) were calculated using the SHAP Python package to determine the influence of each variable on the predictions (34,35). SHAP values are calculated for each variable and each observation, representing a measure of effect (positive or negative value) of the observed value on each individual prediction. SHAP methods withhold and include individual inputs in all possible combinations, and compare differences between withheld and included data, to compute the mean value of all possible differences for attributing the feature importance. SHAP values are output as log odds (i.e., the logarithm of the odds ratio), meaning they are additive explanations of feature importance. SHAP values for each variable are summed for each set of observations (in this case, for each patient), and converted from log odds to probability, which is then output by the model as the prediction. Thus, the more positive SHAP values increase the predicted probability, whereas more negative SHAP values decrease it. Overall feature importance for individual variables in the model were calculated from the SHAP values using the mean absolute values for each variable across all observations.
Analysis of ML Model Performance
Performance of the ML model was measured by the area under the receiver operating characteristic curve (AUROC) in the training, validation, and testing datasets, and the recall, precision, and lift in the testing datasets. Additionally, we evaluated the area under the precision-recall curve (AUPRC) in the testing dataset.
AUROC measures the rate of true and false positives classified by the prediction model across probability thresholds. The definition of true/false positives and negatives is shown in Table 1.
Table 1. -
Definition of true/false positive and negative predictions classified by the model in the assessment of performance in the testing dataset
||Patients classified as COVID-19 positive by the model who were in the COVID-19–positive group
||Patients classified as COVID-19 positive by the model who were in the COVID-19–negative group
||Patients classified as COVID-19 negative by the model who were in the COVID-19–negative group
||Patients classified as COVID-19 negative by the model who were in the COVID-19–positive group
COVID-19, coronavirus disease 2019.
Recall (sensitivity) measures the rate of true positives classified by the model at a specified threshold and is calculated as follows:Precision measures the positive predictive value for the model at a specified threshold and is calculated as follows:Lift measures the effectiveness of the model compared with random sampling and is calculated as follows:AUPRC measures the ratio of precision for corresponding recall values across probability thresholds (36).
AUROC, AUPRC, recall, and precision metrics yield scores on a scale of 0 (lowest) to 1 (highest). A model performing at chance would yield an AUROC of 0.5, an AUPRC equal to the proportion of positives in the dataset, and a lift value of 1. The cutoff threshold for classifying predictions were selected to optimize recall, precision, and lift according to the use case.
We identified data from a select cohort of 40,490 patients on HD meeting eligibility criteria (11,166 patients who were COVID-19 positive and 29,324 who were unaffected and served as the control group). The prevalence of COVID-19 in the cohort (28% COVID-19 positive) was by design higher than the HD population. The prevalence of patients who were COVID-19 positive (about 50% COVID-19 positive) in the training and validation datasets was balanced by design for model building purposes. For the testing dataset used to evaluate final model performance, there was a 10% prevalence of patients who were COVID-19 positive on the basis of the designed data split that was made to estimate the prevalence observed in the national HD population (30,31).
In the cohort, there was a higher proportion of patients on HD with a SARS-CoV-2 infection who were of Black race, Hispanic ethnicity, and had diabetes (Table 2). Mean values for the 81 treatment and laboratory variables before a SARS-CoV-2 infection being identified in the subsequent ≥3 days (or concurrent index date in controls) are shown in Tables 3 and 4.
Table 2. -
Demographics and comorbidities of patients on hemodialysis with and without an undetected severe acute respiratory syndrome coronavirus
2 infection identified in the subsequent ≥3 d
||Coronavirus Disease 2019+
|Number of patients on HD
|Age (yr), mean±SD
|Male, n (%)
|White race, n (%)
|Black race, n (%)
|Other race, n (%)
|Unknown race, n (%)
|Hispanic ethnicity, n (%)
|BMI (kg/m2) mean±SD
|Dialysis vintage (yr) mean±SD
|Diabetes, n (%)
|CHF, n (%)
|Ischemic heart disease, n (%)
|Central venous catheter access, n (%)
Age, sex, and catheter access variables were included in the ML prediction model to classify the risk of an individual HD patient having a SARS-CoV-2 infection being identified in the following ≥3 d. HD, hemodialysis; BMI, body mass index; CHF, congestive heart failure; ML, machine learning; SARS-CoV-2, severe acute respiratory syndrome coronavirus 2.
Table 3. -
Clinical and treatment characteristics of patients on hemodialysis with and without an undetected severe acute respiratory syndrome coronavirus
2 infection identified in the subsequent ≥3 d
||Unaffected Patients, Mean±SD; N
||Coronavirus Disease 2019+ Patients, Mean±SD; N
|Number of patients on HD
|Pre-HD sitting SBP (mm Hg)
|Change in pre-HD sitting SBP (mm Hg)
|Pre-HD sitting DBP (mm Hg)
|Change in pre-HD sitting DBP (mm Hg)
|Pre-HD weight (kg)
|Change in pre-HD weight (kg)
|Pre-HD body temperature (°F)
|Change in pre-HD body temperature (°F)
|Post-HD sitting SBP (mm Hg)
|Change in post-HD sitting SBP (mm Hg)
|Post-HD sitting DBP (mm Hg)
|Change in post-HD sitting DBP (mm Hg)
|Post-HD body temperature (°F)
|Change in post-HD body temperature (°F)
|Pre-HD respirations per min
|Change in pre-HD respirations per min
|Pre-HD pulse (BPM)
|Change in pre-HD pulse (BPM)
|Post-HD respirations per min
|Change in post-HD respirations per min
|Post-HD pulse (BPM)
|Change in post-HD pulse (BPM)
|Change in IDWG (kg)
|Post-HD weight loss (kg)
|Change in post-HD weight loss (kg)
|Post-HD body temperature change
|Change in post-HD body temperature change
|Post-HD respirations per min change
|Change in post-HD respirations per min change
|Post-HD pulse change (BPM)
|Change in post-HD pulse change (BPM)
|% HD treatments with nasal oxygen administered
|Change in % HD treatments with nasal oxygen administered
All variables were included in the ML prediction model to classify the risk of an individual HD patient having a SARS-CoV-2 infection being identified in the following ≥3 d. (100°F−32) ×5/9=37.8°C. HD, hemodialysis; SBP, systolic blood pressure; DBP, diastolic blood pressure; BPM, beats per minute; IDWG, interdialytic weight gain; post-HD weight loss, post-HD minus pre-HD weight (kg); ML, machine learning; SARS-CoV-2, severe acute respiratory syndrome coronavirus 2.
aMean values of HD treatment variables 1–7 d before the prediction date (i.e., 3 d before suspicion of SARS-CoV-2 infection in standard clinical practice).
bMean values of the difference in HD treatment variables 31–60 d to 1–7 d before the prediction date.
Table 4. -
Laboratory characteristics of patients on hemodialysis with and without an undetected severe acute respiratory syndrome coronavirus
2 infection identified in the subsequent ≥3 d
||Unaffected Patients, Mean±SD, N
||Coronavirus Disease 2019+, Patients, Mean±SD, N
|Number of patients on HD
|Change in albumin (g/dl)
|Change in creatinine (mg/dl)
|Change in bicarbonate (mmol/L)
|Change in BUN (mg/dl)
|Change in URR
|Change in sodium (mmol/L)
|Change in potassium (mmol/L)
|Change in phosphate (mg/dl)
|Change in chloride (meq/L)
|Change in calcium (mg/dl)
|Corrected calcium (mg/dl)
|Change in corrected calcium (mg/dl)
|Change in iPTH (pg/ml)
|Change in ferritin (ng/ml)
|Change in TSAT (%)
|Change in Hgb (g/dl)
|Platelet count (×109/L)
|Change in platelet count (×109/L)
|WBC count (×109/L)
|Change in WBC count (×109/L)
|% of neutrophils
|Change in % of neutrophils
|% of lymphocytes
|Change in % of lymphocytes
|% of monocytes
|Change in % of monocytes
|% of eosinophils
|Change in % of eosinophils
|% of basophils
|Change in % of basophils
All variables were included in the ML prediction model to classify the risk of an individual HD patient having a SARS-CoV-2 infection being identified in the following ≥3 d. HD, hemodialysis; Hgb, hemoglobin; WBC, white blood cell; TSAT, transferrin saturation; URR, urea reduction ratio; iPTH, intact parathyroid hormone; ML, machine learning; SARS-CoV-2, severe acute respiratory syndrome coronavirus 2.
aMean values of laboratory variables 1–14 d before the prediction date (i.e., 3 d before suspicion of SARS-CoV-2 infection in standard clinical practice).
bMean values of the difference in laboratory variables 31–60 d to 1–14 d before the prediction date.
Patients on HD who contracted COVID-19 had only subtle, clinically unremarkable distinctions in treatment and laboratory characteristics before being suspected to have a SARS-CoV-2 infection, compared with patients who were unaffected. Mean pre-/post-HD body temperatures (Table 3) and inflammatory markers (white blood cell count and differentials) (Table 4) before a SARS-CoV-2 infection being identified did not show a clinically relevant difference between groups. Patients on HD who had a SARS-CoV-2 infection identified in the following 3 days did appear to have somewhat higher ferritin levels compared with patients who were unaffected.
Prediction Model Feature Importance
Calculation of variable feature importance with SHAP values found the top three predictors of patients on HD having a SARS-CoV-2 infection were the change in interdialytic weight gain from the previous month, mean pre-HD body temperature in the prior week, and the change in post-HD pulse from the previous month (Figure 2A).
The SHAP value plot in Figure 2B further shows the degree of positive or negative effect of each individual measurement for each individual prediction. Each dot corresponds to an individual patient, where the dot’s position on the x-axis represents that feature’s effect on the model prediction; in addition, the color indicates how high or low that feature’s value was. Features with missing values are indicated in gray.
For the top predictor of the change in interdialytic weight gain in the week before compared with the month before a SARS-CoV-2 infection, smaller (negative) values (cooler colors) were associated with a positive SHAP value, whereas larger values (warmer colors) were associated with a negative SHAP value. These results showed for each individual prediction, the model generally considered decreases in interdialytic weight gain from the previous month to be associated with a greater probability of an undetected SARS-CoV-2 infection, and an increase in interdialytic weight gain to be associated with a lower likelihood of an undetected SARS-CoV-2 infection. In other words, patients who do not gain as much weight as usual in between dialysis treatments are deemed more likely to have an undetected SARS-CoV-2 infection by the model.
Along with highlighting directional effects as previously stated, Figure 2B also highlights different distributions of effects that might not be apparent when viewing the mean absolute values as in Figure 2A. For example, the eighth most important variable, change in monocytes from the previous month, produces the largest (most positive) SHAP values out of all of the variables shown. This long, rightward tail along the x-axis indicates that, despite having a lower mean absolute value in comparison to other variables, for some individuals this is very important. Specifically, the model assessed that patients with increased monocyte levels from the previous month are deemed more likely to have a SARS-CoV-2 infection, whereas the SHAP values for those with similar or lower levels of monocytes do not significantly decrease the prediction.
Prediction Model Performance
The ML model had adequate performance in prediction of the 3-day risk for having an undetected SARS-CoV-2 infection. The ML model had an AUROC of 0.77, 0.67, and 0.68 in the training, validation, and testing datasets respectively (Figure 3). The ML model had an AUPRC of 0.24 in the testing dataset (Figure 4).
Setting the threshold for classifying observations as positive or negative at 0.80 to minimize false positives, the precision for the ML model in the testing dataset was 0.52, showing 52% of patients predicted to have a SARS-CoV-2 infection actually had symptoms in the subsequent ≥3 days and were confirmed to have COVID-19. Given the high threshold, recall was 0.07, showing the model correctly predicted true positives for a SARS-CoV-2 infection in 7% of patients on HD who were positive. The lift was 5.3, suggesting model use is 5.3 times more effective in predicting a patient on HD who contracts COVID-19, compared with not having a model (Figure 5).
We successfully developed an ML prediction model using retrospective data, which appears to have suitable performance in identifying patients on HD at risk of having an undetected SARS-CoV-2 infection that is identified in the following ≥3 days. The top predictors of a patient having a SARS-CoV-2 infection were the change in interdialytic weight gain from the previous month, mean pre-HD body temperature in the prior week, and the change in post-HD pulse from the previous month.
Although some top predictors are not surprising, the observed distinctions were subtle. Without insights from the model considering an array of variables, it would not be clear where one should classify a higher or lower risk for an individual patient that is meaningful. For instance, assessing for a decrease in weekly interdialytic weight gain of about 0.3 kg alone may not be considered actionable, and the same is true for assessing for an increase of about 0.2°F (0.1°C) in weekly pre-HD body temperature, or an increase in pulse of about 1 beat per minute. Notably, the average pre-HD body temperature was 97.6°F (36.4°C) (primarily oral measurements) in our analysis and has been previously reported as 98.2°F (36.7°C) (37). Given 98.6°F (37°C) is the expected average in healthy populations, the lower body temperature of patients on HD is of importance with the rather low incidence of fever presenting in patients on dialysis with COVID-19 (11%–66% with fever [16,20,22]). Overall, the small changes observed for each individual variable suggest any one parameter alone has minimal value for detecting a patient’s risk of having COVID-19, especially because every affected patient will not have every symptom of COVID-19 consistently. However, the combinations of minor changes appear to be meaningful in the individualized ML model we developed, with each small change being one piece of the puzzle for each patient’s unique prediction.
Individual predictions can be further used to identify the risk level for dialysis clinics through the proportion of patients classified with an undetected SARS-CoV-2 infection. We anticipate using a combination of individual predictions along with reporting of the percent of patients at risk in each clinic may yield the greatest early insights on: (1) what otherwise asymptomatic patients on HD might be most appropriate for enhanced screening, COVID-19 testing, and triage to an isolation area, and (2) where providers can focus additional resource allocations to combat COVID-19. Furthermore, flagging patients as potentially infectious may cut through some of the “COVID fatigue” occurring during this prolonged pandemic. By adding this additional novelty and warning, the hope is additional care may be given in identifying of potential symptoms during screening. Prospective evaluation of ML model–directed mitigation is being piloted at the national network of dialysis clinics.
The authors propose a conceptual workflow for the application of the ML model predictions to assist with directing care to individual patients and with directing resource allocations to clinics (Figure 6). The model was trained using a target date of 3 days before patients presented with COVID-19 symptoms to alert clinicians at least one dialysis treatment earlier. Given this timeline, we believe it is prudent to run the prediction model on a per-treatment basis. The delivery of reports on individual patient predictions to clinic staff would optimally be delivered on interdialytic days, to provide the care team time to prepare for a more comprehensive screening by an advanced clinician at the next encounter and potential isolation of subsequent HD treatments. The delivery of reports on the percent of patients in each clinic at risk can be performed on a weekly basis to allow leadership and regional managers to meet with clinical managers and prepare for allocation of resources including additional staff, protective equipment, and isolation areas. We propose categorizing clinic-level reports to detail facilities with more than 5% of patients at risk for undetected SARS-CoV-2 infection.
Mitigation efforts at the national dialysis network include universal RT-PCR testing of patients with symptoms of a flu-like illness, along with distinct isolation areas (rooms, shifts, clinics) for patients who are suspected to be infected and under investigation, and patients who are COVID-19 positive. We propose patients predicted to be at risk receive a comprehensive screening for signs and symptoms of a flu-like illness by an advanced practitioner (e.g., physician, physician assistant, nurse practitioner, experienced dialysis nurse) because there is a possibility of false positives. However, the comprehensive assessments should consider any minor sign or symptoms of a flu-like illness that may otherwise be considered normal on the basis of the patient’s uremia and medical history (38,39) to be a reason for suspicion of COVID-19. In addition to the prediction itself, the top reasons increasing the risk score can be provided by calculating the SHAP values (Figure 2B). This may help to provide additional insight into what the what a more comprehensive screening assessment should focus on for each individual patient. For example, if a patient is classified by the model at risk, with the top reason related to a decrease in interdialytic weight gain, the next screening before entry to the clinic could include assessment of any change in appetite or fluid intake. Patients who are high risk and suspected with any mild sign of a flu-like illness could be triaged to unique isolation areas for patients under investigation and receive RT-PCR testing. HD would be continued in a distinct isolation area until diagnosis of COVID-19 or not (determined by two negative RT-PCR tests >24 hours apart), whereby patients who are laboratory positive would be triaged to unique isolation areas for COVID-19, and patients who are negative would return to be treated with the general HD population (Figure 6), which is consistent with the providers’ practices without the model. Patients diagnosed with COVID-19 at the provider are treated in distinct isolation areas until they have two negative RT-PCR tests >24 hours apart, after which patients who have recovered are transferred back to receive HD with the unaffected HD population.
The developed model has the potential to provide a data-driven way for providers to identify individuals with undetected SARS-CoV-2 infections. The conceptual workflow provides a hypothetical strategy that can be adapted within the practice patterns of other providers, which may not include universal testing and require periods of isolation. Different strategies could utilize different thresholds for flagging patients, depending on the intervention and implications of false positives and false negatives. Considering the possibility of prolonged viral shedding observed in the general and dialysis populations (4041–42), the optimal period for isolation of patients on dialysis affected by COVID-19 appears to be longer than 14 days (42). In countries or areas with testing limitations, especially those with a high positive-to-negative testing ratio (e.g., >25% positive test rate), it may be reasonable to consider having separate isolation areas for patients predicted at risk, in addition to isolation areas for patients with symptoms of a flu-like illness. In this scenario, the 14-day timeframe for isolation of patients predicted to be at risk is anticipated to be appropriate if no signs or symptoms of a flu-like illness arise.
As more data are captured in the COVID-19 outbreak, further prediction models that can classify the risk of morbid/mortal outcomes in patients on dialysis affected by COVID-19 need to be developed. The potential applications of AI for COVID-19 have been previously detailed (43); the first priority was suggested as “early detection and diagnosis of the infection.” The robustness of data and an a priori selection of variables to be included in our ML model bring value through assessment of feature importance; this allows for interpretation of meaningfulness of predictors, although it does not determine causality. The selection of input variables was focused on biologic changes reflected in clinical presentations and biomarkers, allowing the model to be generalizable to all individual patients on HD in the overall population, and not specific to the characteristics of outbreaks or the local population where patients reside. Although this approach yields more generalizability for the model to be used in the HD populations worldwide, external factors such as local incidence rates or social determinants of health are anticipated to affect the likelihood of a patient contracting COVID-19 and can be considered as appropriate. Ultimately, this strategy has the potential to allow for COVID-19 to be detected sooner than patients on HD show symptoms, and for a localized HD population, earlier than it would be reported by national authorities.
A systematic review identified several models developed using data from China for early detection of COVID-19 in suspected individuals in the general population (27). One is an externally validated ML model that predicts COVID-19 in suspected asymptomatic patients (AUROC validation 0.872). Another effort used a prediction model (AUROC validation 0.966) to develop logic for an eight-variable COVID-19 risk chart. A further model with an AUROC of 0.938 was created to detect COVID-19 pneumonia in patients admitting to a fever clinic (44). Other models used genomic/computed tomography data to diagnose COVID-19 (27). An effort using data from China not included in prior reviews developed various ML models to predict (AUROC testing 0.87–0.95) and identify features indicative of COVID-19 status across age categories among people in the general population presenting to a clinic/hospital (45). This model found the most important features for prediction of COVID-19 at presentation were lung infection, cough, and pneumonia. Consistent variables used across models for predictions included age, body temperature, and flu-like illness symptoms (27,45). Another distinct effort reported in the literature included the development of ML and traditional models using only full blood count data to predict the likelihood of a COVID-19 among people in the general population presenting to the emergency department (AUROC training 0.80–0.86) of, or patients admitted at (AUROC training 0.94–0.95), a large hospital in Brazil. Although these models were all reported to have suitable performance, all were subject to bias due to nongeneralizable sampling of controls without COVID-19 and possible overfitting. We cannot rule out that our ML model may have similar bias, although it included a large sample and the testing dataset had relatively generalizable sampling for the dialysis population with respect to positives and negatives (30,31). Also, because we randomly selected a subset of patients for the negative arm who never had symptoms of COVID-19 and did not receive PCR testing, it is possible we might have unintentionally included a small number of patients who were asymptomatic. However, this would have required patients to have had an asymptomatic SARS-CoV-2 infection that aligned with the randomly sampled time window. Given the balanced class design of the training and validation data splits, it is unlikely this introduced a remarkable bias in the model during training and validation. Yet, there is a possiblility this could have introduced a minimal bias in evaluation of performance in the testing data because there were fewer patients who were positive to identify to offset any impact of a patient incorrectly labeled negative when positive. Additionally, the reported model performance may be on the conservative side when considering the constraints of the “ground truth” labels, because they relate to how patients who are positive are identified by conventional screening. The extent of this depends on how well the model identifies individuals not included in the training sample but might show similar patterns, and also depends on the intervention design. In any case, our model is unique in its ability to identify the risk of SARS-CoV-2 infection in patients without any suspicion of being affected with the disease.
The developed model holds promise to help providers through the COVID-19 pandemic and subsequent wave(s) of outbreak (44,45). We recommend model use as augmentation and not replacement of symptom screening, as AI modeling is never 100% accurate and model risk classifications need to be interpreted within the extent of the model’s performance. The developed AI model showed a clinically meaningful performance in prediction of individual patients on HD at risk of having an undetected SARS-CoV-2 infection ≥3 days before there would be any suspicion of the disease. Prospective testing is needed and underway at the national network of dialysis clinics. We proposed a conceptual workflow for application of ML model–directed mitigation and testing. These efforts should provide key insights for consideration by health care providers.
C. Monaghan, F. Maddux, H. Han, J. Larkin, L. Usvyat, S. Chaudhuri, and Y. Jiao are employees of Fresenius Medical Care in the Global Medical Office. E. Weinhandl, I. Dahne-Steuber, J. Hymes, K. Belmonte, K. Bermudez, and R. Kossmann are employees of Fresenius Medical Care North America. F. Maddux has directorships in the Fresenius Medical Care Management Board, Goldfinch Bio, and Vifor Fresenius Medical Care Renal Pharma. F. Maddux, I. Dahne-Steuber, J. Hymes, K. Belmonte, L. Usvyat, P. Kotanko, and R. Kossmann have share options/ownership in Fresenius Medical Care. L. Neri is an employee of Fresenius Medical Care Deutschland GmbH in the Europe, the Middle East, and Africa Medical Office. P. Kotanko is an employee of Renal Research Institute, a wholly owned subsidiary of Fresenius Medical Care; reports receiving honorarium from Up-To-Date; and is on the Editorial Board of Blood Purification and Kidney and Blood Pressure Research. All remaining authors have nothing to disclose.
Project and manuscript composition were supported internally by Fresenius Medical Care.
The authors like to acknowledge Mr. Vladimir M. Rigodon for assistance with the composition of the regulatory protocol for this analysis. Previous version of this manuscript appeared on preprint server MedRxiv, https://www.medrxiv.org/content/10.1101/2020.06.15.20131680v1.
K. Belmonte, A. Dahne-Steuber, R.J. Kossmann, P. Kotanko, F.W. Maddux, C.K. Monaghan, and L.A. Usvyat conceptualized the study; S. Chaudhuri, H. Han, Y. Jiao, J.W. Larkin, C.K. Monaghan, and L.A. Usvyat were responsible for the data curation; C.K. Monaghan and L.A. Usvyat were responsible for the formal analysis; K. Belmonte, I.A. Dahne-Steuber, J.L. Hymes, R.J. Kossmann P. Kotanko, F.W. Maddux, C.K. Monaghan, and L.A. Usvyat were responsible for the methodology; C.K. Monaghan was responsible for the validation; J.W. Larkin, C.K. Monaghan, and L.A. Usvyat were responsible for the visualization; J.W. Larkin, C.K. Monaghan, and L.A. Usvyat wrote the original draft; S. Chaudhuri, R.J. Kossmann, J.W. Larkin, and L.A. Usvyat were responsible for the resources; S. Chaudhuri, J.L. Hymes, P. Kotanko, J.P. Kooman, R.J. Kossmann, J.W. Larkin, F.W. Maddux, and L.A. Usvyat provided supervision; and all authors reviewed and edited the manuscript. The interpretation, drafting, and revision of this manuscript was conducted by all authors. The decision to submit this manuscript for publication was jointly made by all authors, and the manuscript was confirmed to be accurate and approved by all authors.
1. Kliger AS, Silberzweig J: Mitigating risk of COVID-19
facilities. Clin J Am Soc Nephrol 15: 707–709, 2020 https://doi.org/10.2215/CJN.03340320
2. Ikizler TA: COVID-19
units: What do we know now and what should we do? Am J Kidney Dis 76: 1–3, 2020 https://doi.org/10.1053/j.ajkd.2020.03.008
3. Basile C, Combe C, Pizzarelli F, Covic A, Davenport A, Kanbay M, Kirmizis D, Schneditz D, van der Sande F, Mitra S: Recommendations for the prevention, mitigation and containment of the emerging SARS-CoV-2
) pandemic in haemodialysis centres. Nephrol Dial Transplant 35: 737–741, 2020 https://doi.org/10.1093/ndt/gfaa069
4. Mokrzycki MH, Coco M: Management of hemodialysis patients with suspected or confirmed COVID-19
infection: Perspective of two nephrologists in the United States. Kidney360 1: 273–278, 2020 https://doi.org/10.34067/KID.0001452020
5. Gallieni M, Sabiu G, Scorza D: Delivering safe and effective hemodialysis in patients with suspected or confirmed COVID-19
infection: a single-center perspective from Italy. Kidney360 1: 403–409, 2020 https://doi.org/10.34067/KID.0001782020
6. Roncon L, Zuin M, Rigatelli G, Zuliani G: Diabetic patients with COVID-19
infection are at higher risk of ICU admission and poor short-term outcome. J Clin Virol 127: 104354, 2020 https://doi.org/10.1016/j.jcv.2020.104354
7. Guo T, Fan Y, Chen M, Wu X, Zhang L, He T, Wang H, Wan J, Wang X, Lu Z: Cardiovascular implications of fatal outcomes of patients with Coronavirus
Disease 2019 (COVID-19
). JAMA Cardiol 5: 811–818, 2020 https://doi.org/10.1001/jamacardio.2020.1017
8. Li X, Xu S, Yu M, Wang K, Tao Y, Zhou Y, Shi J, Zhou M, Wu B, Yang Z, Zhang C, Yue J, Zhang Z, Renz H, Liu X, Xie J, Xie M, Zhao J: Risk factors for severity and mortality in adult COVID-19
inpatients in Wuhan. J Allergy Clin Immunol 146: 110–118, 2020 https://doi.org/10.1016/j.jaci.2020.04.006
9. Cheng Y, Luo R, Wang K, Zhang M, Wang Z, Dong L, Li J, Yao Y, Ge S, Xu G: Kidney disease is associated with in-hospital death of patients with COVID-19
. Kidney Int 97: 829–838, 2020 https://doi.org/10.1016/j.kint.2020.03.005
10. Du RH, Liang LR, Yang CQ, Wang W, Cao TZ, Li M, Guo GY, Du J, Zheng CL, Zhu Q, Hu M, Li XY, Peng P, Shi HZ: Predictors of mortality for patients with COVID-19
pneumonia caused by SARS-CoV-2
: A prospective cohort study. Eur Respir J 55: 2000524, 2020 https://doi.org/10.1183/13993003.00524-2020
11. Adams ML, Katz DL, Grandpre J: Population-based estimates of chronic conditions affecting risk for complications from coronavirus
disease, United States. Emerg Infect Dis 26: 1831–1833, 2020 https://doi.org/10.3201/eid2608.200679
12. United States Renal Data System: 2019 USRDS Annual Data Report: Epidemiology of Kidney Disease in the United States, Bethesda, MD, National Institutes of Health, National Institute of Diabetes and Digestive and Kidney Diseases, 2019. Available at: https://www.usrds.org/annual-data-report/previous-adrs/
. Accessed June 10, 2020
14. CDC COVID-19
Response Team: Geographic differences in COVID-19
cases, deaths, and incidence - United States, february 12-april 7, 2020. MMWR Morb Mortal Wkly Rep 69: 465–471, 2020 https://doi.org/10.15585/mmwr.mm6915e4
15. Johns Hopkins Coronavirus
Resource Center Johns Hopkins University School of Medicine. 2020. Available at: https://coronavirus.jhu.edu/data/mortality
. Accessed October 15, 2020
16. ERA-EDTA: ERACODA - The ERA-EDTA COVID-19
database for patients on kidney replacement therapy, 2020. Available at: https://www.era-edta.org/en/wp-content/uploads/2020/04/ERACODA-Study-Report-2020-04-29.pdf
. Accessed October 15, 2020
17. Wang H: Maintenance hemodialysis and coronavirus
disease 2019 (COVID-19
): Saving lives with caution, care, and courage. Kidney Med 2: 365–366, 2020
18. Jager KJ, Kramer A, Chesnaye NC, Couchoud C, Sánchez-Álvarez JE, Garneata L, Collart F, Hemmelder MH, Ambühl P, Kerschbaum J, Legeai C, Del Pino Y Pino MD, Mircescu G, Mazzoleni L, Hoekstra T, Winzeler R, Mayer G, Stel VS, Wanner C, Zoccali C, Massy ZA: Results from the ERA-EDTA Registry indicate a high mortality due to COVID-19
patients and kidney transplant recipients across Europe. Kidney Int 98: 1540–1548, 2020 https://doi.org/10.1016/j.kint.2020.09.006
19. Kikuchi K, Nangaku M, Ryuzaki M, Yamakawa T, Hanafusa N, Sakai K, Kanno Y, Ando R, Shinoda T, Nakamoto H, Akizawa T; COVID-19
Task Force Committee of the Japanese Association of Dialysis
Physicians; Japanese Society for Dialysis
Therapy; Japanese Society of Nephrology: COVID-19
patients in Japan: Current status and guidance on preventive measures. Ther Apher Dial 24: 361–365, 2020
20. Siordia JA Jr: Epidemiology and clinical features of COVID-19
: A review of current literature. J Clin Virol 127: 104357, 2020 https://doi.org/10.1016/j.jcv.2020.104357
21. Xiong F, Tang H, Liu L, Tu C, Tian JB, Lei CT, Liu J, Dong JW, Chen WL, Wang XH, Luo D, Shi M, Miao XP, Zhang C: Clinical characteristics of and medical interventions for COVID-19
in hemodialysis patients in wuhan, China. J Am Soc Nephrol 31: 1387–1397, 2020 https://doi.org/10.1681/ASN.2020030354
22. Niiler E: An AI epidemiologist sent the first warnings of the Wuhan virus, 2020. Wired
. Available at: https://www.wired.com/story/ai-epidemiologist-wuhan-public-health-warnings/
. Accessed April 22, 2020
23. Bogoch II, Watts A, Thomas-Bachli A, Huber C, Kraemer MUG, Khan K: Pneumonia of unknown aetiology in Wuhan, China: Potential for international spread via commercial air travel. J Travel Med 27: taaa008, 2020 https://doi.org/10.1093/jtm/taaa008
24. McCall B: COVID-19
and artificial intelligence
: Protecting health-care workers and curbing the spread. Lancet Digit Health 2: e166–e167, 2020 https://doi.org/10.1016/S2589-7500(20)30054-6
25. Alimadadi A, Aryal S, Manandhar I, Munroe PB, Joe B, Cheng X: Artificial intelligence
and machine learning
to fight COVID-19
. Physiol Genomics 52: 200–202, 2020 https://doi.org/10.1152/physiolgenomics.00029.2020
26. Wynants L, Van Calster B, Collins GS, Riley RD, Heinze G, Schuit E, Bonten MMJ, Damen JAA, Debray TPA, De Vos M, Dhiman P, Haller MC, Harhay MO, Henckaerts L, Kreuzberger N, Lohman A, Luijken K, Ma J, Andaur CL, Reitsma JB, Sergeant JC, Shi C, Skoetz N, Smits LJM, Snell KIE, Sperrin M, Spijker R, Steyerberg EW, Takada T, van Kuijk SMJ, van Royen FS, Wallisch C, Hooft L, Moons KGM, van Smeden M: Prediction
models for diagnosis and prognosis of covid-19
infection: Systematic review and critical appraisal [published correction appears in BMJ
369: m2204, 2020 10.1136/bmj.m2204]. BMJ 369: m1328, 2020 https://doi.org/10.1136/bmj.m1328
27. Fresenius Medical Care North America: COVID-19
resource and education center. Available at: https://fmcna.com/company/covid-19-resource-center/
. Accessed August 31, 2020
28. Chen T, Guestrin C: XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Francisco, California, Association for Computing Machinery, 2016 pp 785–794 https://doi.org/10.1145/2939672.2939785
29. Centers for Medicare & Medicaid Services: Preliminary medicare COVID-19
data snapshot, 2020. Available at: https://www.cms.gov/research-statistics-data-systems/preliminary-medicare-covid-19-data-snapshot
. Accessed October 14, 2020
30. Anand S, Montez-Rath M, Han J, Bozeman J, Kerschmann R, Beyer P, Parsonnet J, Chertow GM: Prevalence of SARS-CoV-2
antibodies in a large nationwide sample of patients on dialysis
in the USA: A cross-sectional study. Lancet 396: 1335–1344, 2020 https://doi.org/10.1016/S0140-6736(20)32009-2
31. Shapley LS: A value for n-person games. In: Contributions to the Theory of Games II. Annals of Mathematics Studies, edited by Kuhn HW, Tucker AW, Vol. 28, Princeton, Princeton University Press, 1953, pp 307–317
32. Štrumbelj E, Kononenko I: Explaining prediction
models and individual predictions with feature contributions. Knowl Inf Syst 41: 647–665, 2013 https://doi.org/10.1007/s10115-013-0679-x
33. Lundberg SM, Lee SI: A unified approach to interpreting model predictions. Proceedings from the Advances in Neural Information Processing Systems, Vol. 30, 2017. Available at: https://proceedings.neurips.cc/paper/2017/hash/8a20a8621978632d76c43dfd28b67767-Abstract.html
. Accessed June 10, 2020
34. Lundberg SM, Erion G, Chen H, DeGrave A, Prutkin JM, Nair B, Katz R, Himmelfarb J, Bansal N, Lee S-I: From local explanations to global understanding with explainable AI for trees. Nat Mach Intell 2: 56–67, 2020 https://doi.org/10.1038/s42256-019-0138-9
35. Saito T, Rehmsmeier M: The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS One 10: e0118432, 2015 https://doi.org/10.1371/journal.pone.0118432
36. Usvyat LA, Kotanko P, van der Sande FM, Kooman JP, Carter M, Leunissen KM, Levin NW: Circadian variations in body temperature during dialysis
. Nephrol Dial Transplant 27: 1139–1144, 2012 https://doi.org/10.1093/ndt/gfr395
37. Gedney N: Long-Term hemodialysis during the COVID-19
pandemic. Clin J Am Soc Nephrol 15: 1073–1074, 2020 https://doi.org/10.2215/CJN.09100620
38. Gagliardi I, Patella G, Michael A, Serra R, Provenzano M, Andreucci M: COVID-19
and the kidney: From epidemiology to clinical practice. J Clin Med 9: 2506, 2020 https://doi.org/10.3390/jcm9082506
39. Zhou F, Yu T, Du R, Fan G, Liu Y, Liu Z, Xiang J, Wang Y, Song B, Gu X, Guan L, Wei Y, Li H, Wu X, Xu J, Tu S, Zhang Y, Chen H, Cao B: Clinical course and risk factors for mortality of adult inpatients with COVID-19
in Wuhan, China: A retrospective cohort study. Lancet 395: 1054–1062, 2020 https://doi.org/10.1016/S0140-6736(20)30566-3
40. Fontana F, Giaroni F, Frisina M, Alfano G, Mori G, Lucchi L, Magistroni R, Cappelli G: SARS-CoV-2
infection in dialysis
patients in northern Italy: A single-centre experience. Clin Kidney J 13: 334–339, 2020 https://doi.org/10.1093/ckj/sfaa148
41. Shaikh Aisha, Zeldis Etti, Campbell Kirk N, Chan Lili: Prolonged SARS-CoV-2
Viral RNA Shedding and IgG Antibody Response to SARS-CoV-2
in Patients on Hemodialysis. Clin J Am Soc Nephrol, 2020 10.2215/CJN.11120720 33055191
42. Vaishya R, Javaid M, Khan IH, Haleem A: Artificial Intelligence
(AI) applications for COVID-19
pandemic. Diabetes Metab Syndr 14: 337–339, 2020 https://doi.org/10.1016/j.dsx.2020.04.012
43. Feng C, Wang L, Chen X, Zhai Y, Zhu F, Chen H, Wang Y, Su X, Huang S, Tian L, Zhu W, Sun W, Zhang L, Han Q, Zhang J, Pan F, Chen L, Zhu Z, Xiao H, Liu Y, Liu G, Chen W, Li T: A novel artificial intelligence
-assisted triage tool to aid in the diagnosis of suspected COVID-19
pneumonia cases in fever clinics. Annals of Translational Medicine, 2021. Available at: https://atm.amegroups.com/article/view/6078
. Accessed January 29, 2021
44. Leung K, Wu JT, Liu D, Leung GM: First-wave COVID-19
transmissibility and severity in China outside Hubei after control measures, and second-wave scenario planning: A modelling impact assessment. Lancet 395: 1382–1393, 2020 https://doi.org/10.1016/S0140-6736(20)30746-7
45. Xu S, Li Y: Beware of the second wave of COVID-19
. Lancet 395: 1321–1322, 2020 https://doi.org/10.1016/S0140-6736(20)30845-X