Secondary Logo

Journal Logo

Institutional members access full text with Ovid®

Can Machine Learning Methods Produce Accurate and Easy-to-use Prediction Models of 30-day Complications and Mortality After Knee or Hip Arthroplasty?

Harris, Alex H. S., PhD; Kuo, Alfred C., MD, PhD; Weng, Yingjie, MS; Trickey, Amber W., PhD; Bowe, Thomas, PhD; Giori, Nicholas J., MD, PhD

Clinical Orthopaedics and Related Research®: February 2019 - Volume 477 - Issue 2 - p 452–460
doi: 10.1097/CORR.0000000000000601

Background Existing universal and procedure-specific surgical risk prediction models of death and major complications after elective total joint arthroplasty (TJA) have limitations including poor transparency, poor to modest accuracy, and insufficient validation to establish performance across diverse settings. Thus, the need remains for accurate and validated prediction models for use in preoperative management, informed consent, shared decision-making, and risk adjustment for reimbursement.

Questions/purposes The purpose of this study was to use machine learning methods and large national databases to develop and validate (both internally and externally) parsimonious risk-prediction models for mortality and complications after TJA.

Methods Preoperative demographic and clinical variables from all 107,792 nonemergent primary THAs and TKAs in the 2013 to 2014 American College of Surgeons-National Surgical Quality Improvement Program (ACS-NSQIP) were evaluated as predictors of 30-day death and major complications. The NSQIP database was chosen for its high-quality data on important outcomes and rich characterization of preoperative demographic and clinical predictors for demographically and geographically diverse patients. Least absolute shrinkage and selection operator (LASSO) regression, a type of machine learning that optimizes accuracy and parsimony, was used for model development. Tenfold validation was used to produce C-statistics, a measure of how well models discriminate patients who experience an outcome from those who do not. External validation, which evaluates the generalizability of the models to new data sources and patient groups, was accomplished using data from the Veterans Affairs Surgical Quality Improvement Program (VASQIP). Models previously developed from VASQIP data were also externally validated using NSQIP data to examine the generalizability of their performance with a different group of patients outside the VASQIP context.

Results The models, developed using LASSO regression with diverse clinical (for example, American Society of Anesthesiologists classification, comorbidities) and demographic (for example, age, gender) inputs, had good accuracy in terms of discriminating the likelihood a patient would experience, within 30 days of arthroplasty, a renal complication (C-statistic, 0.78; 95% confidence interval [CI], 0.76-0.80), death (0.73; 95% CI, 0.70-0.76), or a cardiac complication (0.73; 95% CI, 0.71-0.75) from one who would not. By contrast, the models demonstrated poor accuracy for venous thromboembolism (C-statistic, 0.61; 95% CI, 0.60-0.62) and any complication (C-statistic, 0.64; 95% CI, 0.63-0.65). External validation of the NSQIP- derived models using VASQIP data found them to be robust in terms of predictions about mortality and cardiac complications, but not for predicting renal complications. Models previously developed with VASQIP data had poor accuracy when externally validated with NSQIP data, suggesting they should not be used outside the context of the Veterans Health Administration.

Conclusions Moderately accurate predictive models of 30-day mortality and cardiac complications after elective primary TJA were developed as well as internally and externally validated. To our knowledge, these are the most accurate and rigorously validated TJA-specific prediction models currently available ( Methods to improve these models, including the addition of nonstandard inputs such as natural language processing of preoperative clinical progress notes or radiographs, should be pursued as should the development and validation of models to predict longer term improvements in pain and function.

Level of Evidence Level III, diagnostic study.

A. H. S. Harris, T. Bowe, N. J. Giori Center for Innovation to Implementation, VA Palo Alto Healthcare System, Palo Alto, CA, USA

A. C. Kuo San Francisco Veterans Affairs Medical Center, University of California, San Francisco, CA, USA

A H. S. Harris, Y. Weng, A. W. Trickey Stanford–Surgical Policy Improvement Research and Education Center, Stanford, CA, USA

N. J. Giori Department of Orthopedic Surgery, Stanford University School of Medicine, Stanford, CA, USA

A. H. S. Harris, VA Palo Alto Healthcare System Center for Innovation to Implementation 795 Willow Road (152-MPD) Menlo Park, CA 94025, USA email:

This work was funded by grants from the VA HSR&D Service (I01-HAX002314-01A1; RCS14-232; CDA 13-279; AHSH).

All ICMJE Conflict of Interest Forms for authors and Clinical Orthopaedics and Related Research® editors and board members are on file with the publication and can be viewed on request.

Each author certifies that his or her institution determined that this investigation is not human subjects research and that all investigations were conducted in conformity with ethical principles of research.

This work was performed at the VA Palo Alto Healthcare System, Center for Innovation to Implementation, Palo Alto, CA, USA, and the Stanford–Surgical Policy Improvement Research and Education Center, Stanford, CA, USA.

The views expressed do not reflect those of the US Department of Veterans Affairs or other institutions.

Received June 21, 2018

Accepted November 19, 2018

© 2019 Lippincott Williams & Wilkins LWW
You currently do not have access to this article

To access this article:

Note: If your society membership provides full-access, you may need to login on your society website