OBJECTIVE: To identify candidate genes and genetic variants for preeclampsia using a bioinformatic approach to extract and organize genes and variants from the published literature.
METHODS: Semantic data-mining and natural language processing were used to identify articles from the published literature meeting criteria for potential association with preeclampsia. Articles were manually reviewed by trained curators. Cluster analysis was used to aggregate the extracted genes into gene sets associated with preeclampsia or severe preeclampsia, early or late preeclampsia, maternal or fetal tissue sources, and concurrent conditions (ie, fetal growth restriction, gestational hypertension, or hemolysis, elevated liver enzymes, and low platelet count [HELLP]). Gene ontology was used to organize this large group of genes into ontology groups.
RESULTS: From more than 22 million records in PubMed, with 28,000 articles on preeclampsia, our data-mining tool identified 2,300 articles with potential genetic associations with preeclampsia-related phenotypes. After curation, 729 articles were “accepted” that contained “statistically significant” associations with 535 genes. We saw distinct segregation of these genes by severity and timing of preeclampsia, by maternal or fetal source, and with associated conditions (eg, gestational hypertension, fetal growth restriction, or HELLP syndrome).
CONCLUSION: The gene sets and ontology groups identified through our systematic literature curation indicate that preeclampsia represents several distinct phenotypes with distinct and overlapping maternal and fetal genetic contributions.
LEVEL OF EVIDENCE: III
Preeclampsia appears to represent several distinct phenotypes with distinct and overlapping maternal and fetal genetic contributions.
Department of Epidemiology, Brown University School of Public Health, Women and Infants Hospital of Rhode Island, Department of Pediatrics, Brown Alpert Medical School, and the Center for Computational Molecular Biology, Providence, Rhode Island; and the Department of Chronic Disease Epidemiology, Yale School of Public Health, New Haven, Connecticut.
Corresponding author: Elizabeth W. Triche, PhD, Assistant Professor of Epidemiology, Brown University School of Public Health, 121 S Main Street, 2nd Floor, Box G-S121-2, Providence, RI 02912; e-mail: Elizabeth_Triche@brown.edu.
Supported by grants from the National Institutes of Health: 1R21HD070177, 5T35HL094308, P20 RR018728, and P20GM103537.
Presented in part at the 2012 Annual Meeting of the American Society of Human Genetics, November 6–10, 2012, San Francisco, California, and the 2013 Annual Meeting of the North American Society of Obstetric Medicine, September 20–21, 2013, Providence, Rhode Island.
Financial Disclosure The authors did not report any potential conflicts of interest.