MethodsReducing Bias Due to Outcome Misclassification for Epidemiologic Studies Using EHR-derived Probabilistic PhenotypesHubbard, Rebecca A.; Tong, Jiayi; Duan, Rui; Chen, YongAuthor Information From the Department of Biostatistics, Epidemiology & Informatics, University of Pennsylvania, Philadelphia, Pennsylvania. Submitted July 15, 2019; accepted March 19, 2020. The research reported in this work was funded through the Patient-Centered Outcomes Research Institute (PCORI) Awards ME-1511-32666 and CDRN-306-01556. All statements in this report, including its findings and conclusions, are solely those of the authors and do not necessarily represent the views of the PCORI Board of Governors or the PCORI Methodology Committee. The authors report no conflicts of interest. Supplemental digital content is available through direct URL citations in the HTML and PDF versions of this article (www.epidem.com). Data included in this report can be requested through the PEDSnet Data Coordinating Center. Statistical program code is provided as Supplementary Digital Content and is available for download from the investigators’ github repository, https://github.com/rhubb/bias_correction. Correspondence: Rebecca A. Hubbard, 604 Blockley Hall, 423 Guardian Dr, Philadelphia, PA 19104. E-mail: [email protected]. Epidemiology: July 2020 - Volume 31 - Issue 4 - p 542-550 doi: 10.1097/EDE.0000000000001193 Buy SDC Metrics Abstract Epidemiologic studies using electronic health record (EHR)-derived phenotypes as outcomes are subject to bias due to phenotyping error. In the case of dichotomous phenotypes, existing methods for misclassified outcomes can be used to reduce bias. In this article, we present a bias correction approach for EHR-derived probabilistic phenotypes: continuous predicted probabilities of the outcome of interest. This approach makes use of correction factors that can be computed by hand and do not require specialized software. We used simulation studies to investigate the performance of the proposed approach under a variety of scenarios for accuracy of the probabilistic phenotype, strength of the outcome/exposure association, and prevalence of the outcome of interest. Across all scenarios investigated, the proposed approach substantially reduced bias in association parameter estimates relative to a naive approach. We demonstrate the application of this approach to a study of pediatric type 2 diabetes using data from the PEDSnet network of children’s hospitals. This straightforward correction factor can substantially reduce bias and improve the validity of EHR-based epidemiology. Copyright © 2020 Wolters Kluwer Health, Inc. All rights reserved.