Recurrent cancer is common, costly, and lethal, yet we know little about it in community-based populations. Electronic health records and tumor registries contain vast amounts of data regarding community-based patients, but usually lack recurrence status. Existing algorithms that use structured data to detect recurrence have limitations.
We developed algorithms to detect the presence and timing of recurrence after definitive therapy for stages I–III lung and colorectal cancer using 2 data sources that contain a widely available type of structured data (claims or electronic health record encounters) linked to gold-standard recurrence status: Medicare claims linked to the Cancer Care Outcomes Research and Surveillance study, and the Cancer Research Network Virtual Data Warehouse linked to registry data. Twelve potential indicators of recurrence were used to develop separate models for each cancer in each data source. Detection models maximized area under the ROC curve (AUC); timing models minimized average absolute error. Algorithms were compared by cancer type/data source, and contrasted with an existing binary detection rule.
Detection model AUCs (>0.92) exceeded existing prediction rules. Timing models yielded absolute prediction errors that were small relative to follow-up time (<15%). Similar covariates were included in all detection and timing algorithms, though differences by cancer type and dataset challenged efforts to create 1 common algorithm for all scenarios.
Valid and reliable detection of recurrence using big data is feasible. These tools will enable extensive, novel research on quality, effectiveness, and outcomes for lung and colorectal cancer patients and those who develop recurrence.
Supplemental Digital Content is available in the text.
*Dana-Farber Cancer Institute
†Harvard Medical School, Boston, MA
‡Institute for Health Research, Kaiser Permanente Colorado, Denver, CO
§The Center for Health Research, Kaiser Permanente Northwest, Portland, OR
Supported by a grant from the National Cancer Institute (R01 CA172143 to M.J.H./D.R.) and an NCI Cooperative Agreement (U19 CA79689 to the Cancer Research Network). The American Society of Clinical Oncology (Career Development Award) and Susan G. Komen for the Cure (Career Catalyst Award) provided salary support to M.J.H. The work of the CanCORS consortium was supported by grants from the National Cancer Institute (NCI) to the Statistical Coordinating Center (U01 CA093344) and the NCI supported Primary Data Collection and Research Centers (Dana-Farber Cancer Institute/Cancer Research Network U01 CA093332; Harvard Medical School/Northern California Cancer Center U01 CA093324; University of Iowa U01 CA01013; University of North Carolina U01 CA093326), and by a Department of Veterans Affairs grant to the Durham VA Medical Center VA HSRD CRS-02-164.
The authors declare no conflict of interest.
Reprints: Michael J. Hassett, MD, MPH, Dana-Farber Cancer Institute, 450 Brookline Avenue, Boston, MA 02215. E-mail: email@example.com.