Secondary Logo

Journal Logo

Institutional members access full text with Ovid®

An Evaluation of Algorithms for Identifying Metastatic Breast, Lung, or Colorectal Cancer in Administrative Claims Data

Whyte, Joanna L., MS, MSPH*; Engel-Nitz, Nicole M., PhD; Teitelbaum, April, MD†,‡; Gomez Rey, Gabriel, MS; Kallich, Joel D., PhD§

doi: 10.1097/MLR.0b013e318289c3fb
Applied Methods

Background: Administrative health care claims data are used for epidemiologic, health services, and outcomes cancer research and thus play a significant role in policy. Cancer stage, which is often a major driver of cost and clinical outcomes, is not typically included in claims data.

Objectives: Evaluate algorithms used in a dataset of cancer patients to identify patients with metastatic breast (BC), lung (LC), or colorectal (CRC) cancer using claims data.

Methods: Clinical data on BC, LC, or CRC patients (between January 1, 2007 and March 31, 2010) were linked to a health care claims database. Inclusion required health plan enrollment ≥3 months before initial cancer diagnosis date. Algorithms were used in the claims database to identify patients’ disease status, which was compared with physician-reported metastases. Generic and tumor-specific algorithms were evaluated using ICD-9 codes, varying diagnosis time frames, and including/excluding other tumors. Positive and negative predictive values, sensitivity, and specificity were assessed.

Results: The linked databases included 14,480 patients; of whom, 32%, 17%, and 14.2% had metastatic BC, LC, and CRC, respectively, at diagnosis and met inclusion criteria. Nontumor-specific algorithms had lower specificity than tumor-specific algorithms. Tumor-specific algorithms’ sensitivity and specificity were 53% and 99% for BC, 55% and 85% for LC, and 59% and 98% for CRC, respectively.

Conclusions: Algorithms to distinguish metastatic BC, LC, and CRC from locally advanced disease should use tumor-specific primary cancer codes with 2 claims for the specific primary cancer >30–42 days apart to reduce misclassification. These performed best overall in specificity, positive predictive values, and overall accuracy to identify metastatic cancer in a health care claims database.

*CSL Behring, King of Prussia, PA

OptumInsight, Health Economics and Outcomes Research, Eden Prairie, MN

Heme Onc Associates and AHT BioPharma Advisory Services, Carlsbad, CA

§Center for Observational Research, Amgen Inc., Thousand Oaks, CA

Supplemental Digital Content is available for this article. Direct URL citations appear in the printed text and are provided in the HTML and PDF versions of this article on the journal's Website,

Editorial and financial support provided by Amgen’s Center for Observational Research to OptumInsight (former Innovus/Ingenix, a division of UnitedHealth Group).

J.L.W. was an employee of Amgen (Thousand Oaks, CA) and is currently employed by CSL Behring; G.G.R. and N.M.E.-N. are employees of OptumInsight; A.T. was an employee of OptumInsight and currently at Heme Onc Associates and AHT BioPharma Advisory Services; and J.D.K. is an employee of Amgen (Thousand Oaks, CA).

Reprints: Joanna L. Whyte; MS, MSPH, CSL Behring, 1020 First Avenue, King of Prussia, PA 19406. E-mail:

Copyright © 2015 Wolters Kluwer Health, Inc. All rights reserved.