Institutional members access full text with Ovid®

Share this article on:

An Electronic Health Record–based Algorithm to Ascertain the Date of Second Breast Cancer Events

Chubak, Jessica PhD*,†; Onega, Tracy PhD; Zhu, Weiwei MS*; Buist, Diana S. M. PhD*,†,§; Hubbard, Rebecca A. PhD

doi: 10.1097/MLR.0000000000000352
Online Articles: Applied Methods

Objectives: Studies of cancer recurrences and second primary tumors require information on outcome dates. Little is known about how well electronic health record–based algorithms can identify dates or how errors in dates can bias analyses.

Research Design: We assessed rule-based and model-fitting approaches to assign event dates using a previously published electronic health record-based algorithm for second breast cancer events (SBCE). We conducted a simulation study to assess bias due to date assignment errors in time-to-event analyses.

Subjects: From a cohort of 3152 early-stage breast cancer patients, 358 women accurately identified as having had an SBCE served as the basis for this analysis.

Measures: Percent of predicted SBCE dates identified within ±60 days of the true date was the primary measure of accuracy. In the simulation study, bias in hazard ratios (HRs) was estimated by averaging the difference between HRs based on algorithm-assigned dates and the true HR across 1000 simulations each with simulated N=4000.

Results: The most accurate date algorithm had a median difference between the true and predicted dates of 0 days with 82% of predicted dates falling within 60 days of the true date. Bias resulted when algorithm sensitivity and specificity varied by exposure status, but was minimal when date assignment errors were of the magnitude observed for our date assignment method.

Conclusions: SBCE date can be relatively accurately assigned based on a previous algorithm. While acceptable in many scenarios, algorithm-assigned dates are not appropriate to use when operating characteristics are likely to vary by the study exposure.

*Group Health Research Institute

Department of Epidemiology, University of Washington, Seattle, WA

Department of Community and Family Medicine, The Dartmouth Institute for Health Policy and Clinical Practice, and Norris Cotton Cancer Center, Geisel School of Medicine at Dartmouth, Lebanon, NH

Departments of §Health Services, University of Washington, Seattle, WA

Department of Biostatistics and Epidemiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA

Supported by the National Cancer Institute at the National Institutes of Health (R21CA143242 to J.C., R01CA149365 to T.O., R01CA09377 to Rebecca Silliman, U01CA063731 to D.S.M.B, R01CA120562 to Denise Boudreau, and U19CA79689 to Ed Wagner) and the American Cancer Society (CRTG-03–024-01-CCE to D.S.M.B). The collection of cancer incidence data used in this study was supported, in part, by the Cancer Surveillance System of the Fred Hutchinson Cancer Research Center, which is funded by the Surveillance, Epidemiology and End Results (SEER) Program of the National Cancer Institute (contract numbers N01-CN-67009 and N01-PC-35142) with additional support from the Fred Hutchinson Cancer Research Center and the State of Washington.

The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health or the American Cancer Society.

The authors declare no conflict of interest.

Reprints: Jessica Chubak, PhD, Group Health Research Institute, 1730 Minor Avenue, Ste. 1600 Seattle, WA 98101. E-mail:

Copyright © 2017 Wolters Kluwer Health, Inc. All rights reserved.