See “Comorbidities in Childhood Celiac Disease: A Phenome Wide Association Study Using the Electronic Health Record” by Prinzbach et al on page 488.
Precision medicine is an emerging integrative approach for disease prevention, early detection, and treatment that takes into account individual variability in genetics, medical history, environmental exposures, and lifestyle. The electronic health record (EHR) captures extensive information on patients over time from all aspects of medical care. The captured structured data can be clinical diagnosis in the form of International Classification of Disease-9/10 (ICD-9/10), measurements collected from the patients in the form of vital signs, laboratory tests (urine, blood, and swabs), imaging sessions, encounters with the patients by medical staff, medication orders and prescriptions. Some of the data captured, such as doctors’ notes, is either semi or un-structured. EHR provides a new source of data for biomedical research, allowing the researchers to ask new questions about care standards, disease diagnostics and improve disease therapeutics. An example for a suggested change in care standards is a study by Rappoport et al (1). In this paper, the authors show how a data-driven approach on healthy individuals finds that intervals used in to determine abnormalities of laboratory tests results are different between male and females for some laboratory tests. An example of leveraging EMR to improve predictive disease diagnostics is shown in a recent study by Miotto et al (2). Here, the authors apply a deep learning framework to pre-process raw EHR data to improve performance when predicting future disease. Finally, an example showing improved therapeutics was recently published by Boland et al (3) where the authors categorize and identify harmful drugs from medication group C (classified as “risk not ruled out” for pregnant patients) using a machine learning approach, and by that shed light on which drugs from this group should not be used during pregnancy. Access to EHR also allows researchers to identify and define precise subsets of patients to study further and apply precision medicine strategies for disease diagnostics and therapeutics.
Celiac disease (CD) is an immune-mediated condition triggered in genetically susceptible individuals by the ingestion of gluten-containing grains and associated with human leukocyte antigen (HLA) DQ2 and DQ8 haplotypes (4). In the pediatric population, CD ranges from classic CD, diagnosed in at the early age of 6 to 24 months and non-classic CD that is usually diagnosed better diagnosis and treatment. In the past, there were several attempts to study CD using HER (5). CD is especially challenging in the EHR context since the value of ICD-9 code related to the disease has been shown in the past to poorly identity patients with CD (6). A recent paper described the co-occurrence of CD and autoimmune diseases using EHR (7). In Prinzbach et al., published in the current issue of the Journal of Pediatric Gastroenterology & Nutrition(8), the authors are leveraging EHR to analyze associations of phenotypes and to test known comorbidities with celiac disease in children. They furthermore use this approach to discover new associations of ICD-10 codes with CD. They identify 45 ICD-10 codes that are significantly associated with CD, among them 13 are known comorbidities including autoimmune conditions such as Crohn disease and type 1 diabetes and 9 are expected symptoms of CD such as abdominal and pelvic pain and nausea and vomiting. Novel ICD-10 include personal traits, other conditions such as eosinophilic esophagitis, abnormal immunological, and serum enzyme levels. Although deeper investigation is required to verify and validate these findings, this paper presents the first step toward leveraging electronic health records to study CD.
With the complexity and sparsity of EHR data, there is a huge need for new computational methods and approaches. Taking longitudinal aspect of the data into account and developing patient trajectories is one obvious next step. Applying innovative machine learning and clustering approaches to classify and stratify patients will allow us to study disease heterogeneity in the context of treatment response and outcomes. Furthermore, leveraging EHR data repositories and integrating clinical phenotyping data together with molecular measurements provides an unprecedented opportunity to advance the precision medicine in the context of disease diagnostics and therapeutics and will be key in informing a range of clinical and translational precision medicine research. Phenome Wide Association Study (PheWAS) (9) is a method of combining of genomic and EHR data. In the past, PheWAS led to validation of known genotype-phenotype associations and better definitions of disease subtypes. PheWAS method on based on genomic and EHR data has shown an association between HLA haplotype DQB1∗02:01-DQA1∗05:01 and CD (10). The integration of genomic data with EHR data has been applied extensively in projects such as eMERGE (11) and Geisenger MyCode project (12), which was able to leverage whole exome sequencing with EHR data to confirm existing and identify new associations with metabolic traits (9,10). There is no doubt that in the near future, efforts to gather EHR data, combine it with other molecular measurements and advanced computational approaches will improve the quality and accuracy of patient care, and lead to new discoveries in precision medicine.
1. Rappoport N, Paik H, Oskotsky B, et al. Creating ethnicity-specific reference intervals for lab tests from EHR data. bioRxiv
2017; 213892. doi:10.1101/213892.
2. Miotto R, Li L, Kidd BA, et al. Deep patient: an unsupervised representation to predict the future of patients from the electronic health records. Sci Rep
3. Boland MR, Polubriaginof F, Tatonetti NP. Development of a machine learning algorithm to classify drugs of unknown fetal effect. Sci Rep
4. Fasano A, Berti I, Gerarduzzi T, et al. Prevalence of celiac disease in at-risk and not-at-risk groups in the united states: a large multicenter study. Arch Intern Med
5. Fasano A. Clinical presentation of celiac disease in the pediatric population. Gastroenterology
6. Tanpowpong P, Broder-Fingert S, Obuch JC, et al. Multicenter study on the value of ICD-9-CM codes for case identification of celiac disease. Ann Epidemiol
7. Escudié J-B, Rance B, Malamut G, et al. A novel data-driven workflow combining literature and electronic health records to estimate comorbidities burden for a specific disease: a case study on autoimmune comorbidities in patients with celiac disease. BMC Med Inform Decis Mak
8. Prinzbach AA, Moosavinasab S, Rust S, et al. Comorbidities in childhood celiac disease: a phenome wide association study using the electronic health record. J Pediatr Gastroenterol Nutr
9. Denny JC, Bastarache L, Roden DM. Phenome-wide association studies as a tool to advance precision medicine. Annu Rev Genomics Hum Genet
10. Karnes JH, Bastarache L, Shaffer CM, et al. Phenome-wide scanning identifies multiple diseases and disease severity phenotypes associated with HLA variants. Sci Transl Med
11. Gottesman O, Kuivaniemi H, Tromp G, et al. The Electronic Medical Records and Genomics (eMERGE) Network: past, present, and future. Genet Med
12. Carey DJ, Fetterolf SN, Davis FD, et al. The Geisinger MyCode community health initiative: an electronic health record–linked biobank for precision medicine research. Genet Med