There is currently no widely accepted approach to screening for pancreatic cancer (PC). We aimed to develop and validate a risk prediction model for pancreatic ductal adenocarcinoma (PDAC), the most common form of PC, across 2 health systems using electronic health records.
This retrospective cohort study consisted of patients aged 50–84 years having at least 1 clinic-based visit over a 10-year study period at Kaiser Permanente Southern California (model training, internal validation) and the Veterans Affairs (VA, external testing). Random survival forests models were built to identify the most relevant predictors from >500 variables and to predict risk of PDAC within 18 months of cohort entry.
The Kaiser Permanente Southern California cohort consisted of 1.8 million patients (mean age 61.6) with 1,792 PDAC cases. The 18-month incidence rate of PDAC was 0.77 (95% confidence interval 0.73–0.80)/1,000 person-years. The final main model contained age, abdominal pain, weight change, HbA1c, and alanine transaminase change (c-index: mean = 0.77, SD = 0.02; calibration test: P value 0.4, SD 0.3). The final early detection model comprised the same features as those selected by the main model except for abdominal pain (c-index: 0.77 and SD 0.4; calibration test: P value 0.3 and SD 0.3). The VA testing cohort consisted of 2.7 million patients (mean age 66.1) with an 18-month incidence rate of 1.27 (1.23–1.30)/1,000 person-years. The recalibrated main and early detection models based on VA testing data sets achieved a mean c-index of 0.71 (SD 0.002) and 0.68 (SD 0.003), respectively.
Using widely available parameters in electronic health records, we developed and externally validated parsimonious machine learning-based models for detection of PC. These models may be suitable for real-time clinical application.