In 2007, the US Congress passed the Food and Drug Administration (FDA) Amendment Act,1 which called for the establishment of an “active postmarket risk identification and analysis system” with access to data from 100 million people by 2012. The American Recovery and Reinvestment Act2 of 2009 committed $1.1 billion to comparative effectiveness research (CER), to enable “the conduct and synthesis of research comparing the benefits and harms of different interventions and strategies to prevent, diagnose, treat, and monitor health conditions in ‘real world’ settings.”2 Both developments rely on expanded secondary use of observational healthcare data, and have thrust the field of epidemiology into the national spotlight. In particular, these efforts require a significant evolution in the use of these data from the customized design of an individual study of a particular product-outcome association to a broader effort that effectively uses these data for active monitoring of any medical product and any health outcome of interest across a network of disparate databases. The envisioned systems would go beyond the retrospective evaluation of hypothesized effects. Researchers would have the opportunity to proactively explore the data to generate and refine hypotheses of potential issues and benefits that warrant further scrutiny. Little research currently exists to inform the development of such systems.
Focusing just on active drug safety surveillance, open questions include the following:
Which statistical and epidemiologic methods work best for various types of drugs and outcomes?
What is the optimal way to combine information from disparate databases?
What thresholds should active surveillance systems use to identify risks?
What processes should be set in place to handle newly identified risks?
How often will active surveillance systems yield false-positives and false-negatives?
The last question is key, and no answer to it currently exists. In fact, the absence of established operating characteristics bedevils all of pharmacoepidemiology. Typical studies in observational databases attempt to control for bias by careful control for confounders, restrictions to certain patient subgroups, etc. Although these issues have received scrutiny through academic discourse and anecdotes in the literature, no systematic studies have been published to elucidate how these issues manifest themselves in real-world data. Consequently, no assurance can be provided that the relative risks, odds ratios, etc estimated by these studies are anywhere near the corresponding true values or that standard 95% intervals actually contain the truth 95% of the time.
Answers to these questions are unlikely to arise from theoretical research, and instead must derive from extensive empirical experimentation. The Observational Medical Outcomes Partnership (OMOP), a public-private partnership chaired by the FDA and managed through the Foundation for the National Institutes of Health, represents one step in this direction. OMOP conducts methodological research to inform the national drug safety efforts by empirically measuring the performance of an array of alternative analysis methods across a network of 10 databases, covering over 200 million patients.3
OMOP studies the operating characteristics of methods that attempt to discern true drug-outcome associations from background noise. The basic approach thus far comprises 3 bodies of work:
1. Establishing a data network that contains the types of data anticipated for use in active surveillance and comparative effectiveness research (both administrative claims and electronic health records covering various populations, and converted to a common data model format).
2. Creating a research laboratory to enable methodologists to develop and implement analytical approaches for estimating the effects of medical products. To date, collaborating organizations that participated in methods development activities have produced 14 computationally feasible approaches. This includes systematic implementation of traditional epidemiologic designs, such as propensity-score adjusted inception cohort designs, case-control surveillance, case-crossover studies, and self-controlled case series.
3. Applying methods to specific drug-outcome pairs across all databases within the OMOP data network. In the initial experiments, OMOP used 53 drug-outcome pairs as ground truth with 9 true-positive drug-outcome pairs and 44 negative controls. Product labels, prior observational studies, and expert consensus informed this classification. A variety of metrics such as sensitivity, specificity, and area-under-the-curve quantified the extent to which estimates cohered with the ground truth.
In accordance with its mission, all work products developed throughout OMOP are publicly available on the OMOP website (http://omop.fnih.org), including methods, descriptions, and source code, data characterization tools, white papers, and public presentations. OMOP released the findings of its initial experiments at its 2011 OMOP Symposium, and presentations and audio narrations are available.
OMOP was initially conceived as a 2-year research project, aimed at conducting a series of experiments to measure methods performance. The research has raised additional questions that require further investigation. As the research progressed, several groups recognized the value of sustaining a shared resource to facilitate collaborative methodological research across all stakeholders. Therefore, OMOP continues to serve this vision and has invited those interested in collaborating to join the broader OMOP community. We encourage interested researchers to register on the OMOP website (http://omop.fnih.org) and contact the OMOP team if they are interested in actively participating in this effort. Ongoing work includes exploring the existing results to understand why current approaches yield false-positives and false-negatives and developing strategies to overcome these limitations.
The OMOP experiments represent an initial attempt to empirically estimate the operating characteristics of standard pharmacoepidemiology procedures, applied to observational data. However, OMOP is not alone in the conduct of methodological research to inform the appropriate use of observational databases. European efforts to assess the performance of active surveillance methods across international data sources include IMI-PROTECT4 and EU-ADR.5 The Mini-Sentinel initiative in the United States has initiated multiple workstreams examining different classes of methodology (http://mini-sentinel.org). In the context of vaccines, the vaccine safety datalink project has presented initial findings6 showing that their current approach yielded one true-positive from 10 putative signals, a positive predictive value of 10%.
Although early results from OMOP and from these various efforts demonstrate the promise of systematic observational analyses, they highlight important methodological challenges when attempting to produce unbiased effect estimates in observational healthcare databases. Much work is needed to maximize the likelihood of observing true effects while reducing the risk of false-positive findings. Specific areas of opportunity include enhancing strategies to identify and mitigate sources of bias, improving the extraction of clinical elements from the underlying data, developing solutions for refined outcome definitions, establishing strategies to integrate observational results across disparate data sources, and designing a broader framework for how observational analyses can contribute to a causality assessment.
The empirical findings to date appear to be consistent with prior concerns about the accuracy of epidemiologic investigations7 (including the suggestion that only large effects be considered as credible from observational analyses8) and concerns about inconsistencies in various published analyses of the same drug safety concern.9,10 Until observational data and analytic methods improve to the point that operating characteristics are consistent with the public's expectations, there will continue to be a credibility gap for epidemiologists working in this area.
Observational studies should serve as the primary and, in some circumstances, best source of evidence in evaluating the comparative safety and effectiveness of medical products11—but only if epidemiologic investigations have sufficient reliability to provide stakeholders with the confidence, needed to inform medical decision making. This involves demonstrating to skeptics that epidemiology can be more than an art and should be treated as a science, following systematic processes that generate reproducible and fully transparent research with well-understood operating characteristics. In particular, clinical insights that currently underlie many design decisions in pharmacoepidemiology need to be applied consistently and subjected to rigorous evaluation.
Advancing the science of active surveillance and comparative effectiveness requires the interdisciplinary collaboration of epidemiology, statistics, health services research, computer science, medical informatics, engineering, and the clinical sciences. Substantial research challenges remain, but progress can translate directly into significant benefits for patients.
ABOUT THE AUTHORS
DAVID MADIGAN is Professor and Chair of Statistics at Columbia University in the City of New York. He is a Fellow of the American Statistical Association and the Institute of Mathematical Statistics. He works on analytic methods for drug safety and on Bayesian biostatistics. PATRICK RYAN is Associate Director, Analytical Epidemiology at Johnson & Johnson and a Research Investigator at Observational Medical Outcomes Partnership (OMOP). His research focuses on the development and application of exploratory analysis methods to better understand the effects of medicines.
The authors gratefully acknowledge insightful comments from Bram Hartzema, Judy Racoosin, Christian Reich, Paul Stang, Emily Welebob.