Background: Administrative health care claims data are used for epidemiologic, health services, and outcomes cancer research and thus play a significant role in policy. Cancer stage, which is often a major driver of cost and clinical outcomes, is not typically included in claims data.
Objectives: Evaluate algorithms used in a dataset of cancer patients to identify patients with metastatic breast (BC), lung (LC), or colorectal (CRC) cancer using claims data.
Methods: Clinical data on BC, LC, or CRC patients (between January 1, 2007 and March 31, 2010) were linked to a health care claims database. Inclusion required health plan enrollment ≥3 months before initial cancer diagnosis date. Algorithms were used in the claims database to identify patients’ disease status, which was compared with physician-reported metastases. Generic and tumor-specific algorithms were evaluated using ICD-9 codes, varying diagnosis time frames, and including/excluding other tumors. Positive and negative predictive values, sensitivity, and specificity were assessed.
Results: The linked databases included 14,480 patients; of whom, 32%, 17%, and 14.2% had metastatic BC, LC, and CRC, respectively, at diagnosis and met inclusion criteria. Nontumor-specific algorithms had lower specificity than tumor-specific algorithms. Tumor-specific algorithms’ sensitivity and specificity were 53% and 99% for BC, 55% and 85% for LC, and 59% and 98% for CRC, respectively.
Conclusions: Algorithms to distinguish metastatic BC, LC, and CRC from locally advanced disease should use tumor-specific primary cancer codes with 2 claims for the specific primary cancer >30–42 days apart to reduce misclassification. These performed best overall in specificity, positive predictive values, and overall accuracy to identify metastatic cancer in a health care claims database.