Growing public investment in national health data networks now supports the use of observational healthcare data for routine postlicensure safety surveillance of medical products. Examples include the Centers for Disease Control and Prevention’s Vaccine Safety Datalink,^{1} the Food and Drug Administration’s Mini-Sentinel Project,^{2} and the Post-licensure Rapid Immunization Safety Monitoring Initiative.^{3}

One use of these national health data networks is for “near real-time” sequential database surveillance.^{2} ^{,} ^{4} ^{,} ^{5} Sequential database surveillance assesses evidence of association with respect to a prespecified exposure–outcome pair.^{2} During surveillance, the incidence rate of the exposure–outcome pair (eg, oral antidiabetic medications and acute myocardial infarctions) is continuously compared with a historical-, concurrent-, or self-control population as data accrue. If a statistical signal of excess risk is identified, the null hypothesis of “no excess risk” is rejected. If not, the sequential database surveillance ends at a predetermined stopping point while failing to reject the null. Any detected statistical signal is followed by confirmatory assessments to validate or refute the finding.^{5}

Postmarket sequential database surveillance is related to, but still distinct from, sequentially monitored clinical trials, for which there is a well-established statistical literature.^{6} ^{,} ^{7} Optimal design and analysis aspects of sequential database surveillance are less developed,^{8} ^{,} ^{9} but gaining in importance. The key difference is that clinical trials typically prioritize power and sample size whereas postmarket sequential database surveillance places a high priority on power and the time to detect a signal. This attention to the calendar time to detect a signal is important because of potential harm to the general population from delayed detection of a safety problem. To support quick detection, sequential database surveillance analyses depend on amassing sufficient information (ie, sample size) to reach a stopping point, either by rejecting the null (ie, detecting a safety signal) or ending surveillance (ie, failing to signal).^{10} ^{,} ^{11}

Sequential database surveillance is performed in an observational setting where the investigator has no control over information accrual within each data source. Investigators must select how many observational healthcare databases to monitor and the duration of monitoring. First, the investigator must consider each database’s contribution to sample size, which depends on the size of the population of interest and that population’s adoption of medical products. Secondary attributes, such as delays and misclassification errors in data capture, also are important. Finally, the cost must be weighed. The database-specific cost may increase considerably if medical chart validation accompanies surveillance.

To illustrate these design choices, investigators in the Mini-Sentinel pilot project wished to detect a 1.33 incidence rate ratio of acute myocardial infarctions among new users of saxagliptin as compared with new users of other oral antidiabetic agents with 80% power, which required 23,000 person-years of exposure.^{12} Hypothetically, database configuration A (eg, databases 1, 3, 6, and 8) might take twice as long as database configuration B (eg, databases 1–4) to capture 23,000 person-years, but at 75% of the cost and with greater accuracy. Which is the more optimal configuration?

Beyond feasibility, an important consideration is the context of safety surveillance, and specifically, what is known (if anything) about the hypothesized safety risk. Surveillance may be undertaken in circumstances when there is little expectation of a safety problem, but surveillance is performed for reassurance. Surveillance may also be undertaken when data from spontaneous reporting systems or underpowered prelicensure data suggest the possibility of a safety problem. Additionally, people external to the surveillance population are affected by the speed and confidence (ie, statistical power) with which a safety signal is detected or ruled out. The need for, and public health value of, early detection depends on many factors including the exposure prevalence, the background rate, and any existing data to suggest the true effectsize.

The aim of safety surveillance for each exposure–outcome pair is specific to the context. Thus, one might aim to maximize statistical power, minimize the calendar time to detect a signal, or minimize the maximum duration of surveillance. Another objective might be performance of surveillance activities within a defined budget.

We describe here the relationships between statistical power and sample sizes for various true effect sizes using sequential database surveillance. We then present a four-step planning process to choose a database design configuration for a prespecified exposure–outcome pair. We illustrate this process using a vaccine example. We close with a discussion about the limitations and modifications of this process.

SAMPLE SIZE CALCULATION
Typical sample size calculations for nonsequential statistical methods require investigators to calculate the relationship between the prespecified upper limit for accepting false-positive results (ie, type-I error), the statistical power to detect a particular effect size (ie, type-II error), and the sample size. Sample size calculations for sequential statistical models are similar except they incorporate the ability to interrupt surveillance by rejecting the null hypothesis at a point earlier than the prescribed end of surveillance. Thus, in sequential surveillance, there are two sample sizes to consider: one is the sample size needed to reject the null hypothesis (ie, the time to detect a signal) and the other is the maximum sample size (ie, the maximum length of surveillance). Smaller true effect sizes require larger samples. Additionally, sequential statistical methods are concerned with the choice of the sequential boundary,^{8} ^{,} ^{9} ^{,} ^{13–15} which relates to how type-I error is apportioned among multiple hypothesis tests and also limits the statistical power that can be achieved at any interim testing point.

Therefore, we consider six variables in our sample size calculation: (1) the sequential stopping boundary, (2) type-I error, (3) true effect size, (4) statistical power, (5) maximum sample size, and (6) the time to detect a safety signal. The time-to-signal has a probabilistic distribution that we represent with a summary statistic: the median of the time until surveillance ends, irrespective of whether a signal is detected. We use this statistic only when the statistical power is at least 50%, ensuring the median reflects a time when a signal was detected. This definition assures an accurate comparison of the time-to-signal for different systems with different statistical power. Detailed information on these calculations is presented in eAppendix (https://links.lww.com/EDE/A694 ). Henceforth, we refer to this value as the median sample size, analogous to the average or expected sample size in the group sequential trials literature.^{14} Smaller median sample sizes are preferred because we need less information to detect a safety problem.

Figure 1 shows the relationship between the six quantities described above when using a particular statistical model (the Poisson maximized sequential probability ratio test [MaxSPRT]).^{16} We use this statistical model to illustrate our sample size calculation because it has been extensively employed in vaccine and drug safety surveillance.^{4} ^{,} ^{17–25} In Figure 1 , we fix the first four variables—they act as independent, given quantities—and we examine their effect on the latter two variables, which are depicted in two sets of isolines.

FIGURE 1: The relationship between statistical power, median sample size, maximum sample size, and true effect size using the Poisson MaxSPRT model with a minimum of four events. Statistical power isolines travel from northwest to southeast. Median sample size isolines travel from southwest to northeast. Overall type-I error set to 0.05. The star represents the starting point of the simplified example. P-Y indicates person-years; Pr, power.

First, the sequential stopping boundary is flat, meaning that the null hypothesis is rejected when the test statistic exceeds a set critical value.^{16} This boundary is commonly used with the Poisson MaxSPRT and is why we choose to use it here, although the Poisson MaxSPRT can support other boundaries.^{9} Second, the prespecified type-I error is set to 0.05 for all analyses. Third, the true effect sizes, illustrated along the x-axis, are given in two scales. The upper scale is an absolute risk measure corresponding to a specified comparator group event rate of 1 event /10,000 person-years. The lower scale is defined using the equivalent relative risk measure. Fourth, the y-axis shows the maximum sample size. Again, we set these variables in order to understand their effect on statistical power and median sample size.

Statistical power is depicted in the first set of isolines that travels from northwest to southeast. These isolines are downward sloping because the same statistical power can be attained with smaller sample sizes when the true effect sizes are larger. Statistical power increases with maximum sample size because there are more opportunities to detect a signal.

Median sample size is depicted in the second set of isolines that travels from southwest to northeast. For a given effect size, there are minimal increases in the median sample size by increasing the maximum sample size. Vertically asymptotic behavior dominates as statistical power approaches unity. The values of the median sample size isolines become smaller as the true effect size increases because smaller sample sizes will signal under conditions of greater risk.

Figure 1 can be used in the following way. We assume the investigator plans surveillance to detect a prespecified effect size with a desired statistical power, for example, an incidence rate difference of 2 events per 10,000 person-years at 90% power with an overall type-I error of 0.05, shown as the star. Given these starting criteria, the investigator will need a maximum sample size of 50,000 exposed person-years (as depicted on the y-axis) and a median sample size of 16,500 exposed person-years to detect a signal at that effect size, as depicted on the median sample size isoline.

Let us assume the investigator believes these sample sizes will not be attainable within the calendar time allotted for surveillance. Remedying this situation requires either increasing the number of databases in the configuration or extending the calendar duration of surveillance. If neither of these options is possible, the investigator could decide to relax the criterion for the statistical power while maintaining the overall type-I error and effect size. By moving straight down the y-axis (ie, holding the desired effect size constant) from the 90% to the 80% power isoline, the new maximum sample size is 33,333 exposed person-years and the new median sample size is 14,690 exposed person-years. When the maximum sample size is reduced to 67% of the original value, statistical power is lost but the median sample size is smaller. How do we understand this trade-off? Is the modest decrease in median sample size worth the loss in power?

FOUR-STEP PLANNING PROCESS WITH VACCINE EXAMPLE
Continuing with the example above, we give these tradeoffs more concrete meaning by translating the information time concepts into calendar time. An investigator may value the change in median sample size (ie, the additional 1810 exposed person-years) differently depending on whether that change translates into an additional 2- or 10-week difference in calendar time. From a public health perspective, when the possibility of real harm is present in a population larger than the one under observation, these calendar time differences provide a meaningful way to assess sample size. To illustrate how these calendar differences are established, we provide a simple example.

The four steps of the planning process are as follows: (1) perform a sample size calculation to detect a prespecified effect size with a desired statistical power using Figure 1 , (2) estimate database-specific adoption and exposure patterns of the medical product, (3) aggregate the exposures across the database configurations and identify the two stopping points of surveillance in calendar time, and (4) choose the best database configuration using decision analysis.

Inputs to the Planning Process: Surveillance Specifics
The investigator begins with the following surveillance-specific inputs to the planning process: the exposure–outcome pair to be evaluated, an epidemiologic design, and the corresponding sequential statistical method for analysis. First, the description of the exposure–outcome pair must specify the “at risk” period for the outcome. Second, an epidemiologic design indicates the way the population of interest and the comparison population will be sampled for statistical inference. For sample size calculations, the investigator must specify the expected incidence rate of the outcome in the comparison population, which is also the expected incidence rate under the null hypothesis. Third, the sequential statistical method must specify the sequential stopping boundary.

We plan surveillance for a single dose, newly available live attenuated childhood vaccine suspected of elevating the risk of encephalitis/meningitis. Table 1 lists relevant investigator-specified surveillance parameters. We assume encephalitis/meningitis in infants occurs at a rate of 1 event per 10,000 person-years in a clinically relevant historical comparator group (ie, the background rate). Based on the incubation period of the live virus, we assume that encephalitis/meningitis could be plausibly associated with the vaccine only if the infection occurs between 5 and 15 days postvaccination. Thus, each dose contributes 10 person-days of exposure. We choose this example because exposure is discrete, the risk window is defined, and the adoption pattern is relatively simple (ie, children receive routine vaccinations during well-baby visits).

TABLE 1: Investigator-specified Example Parameters

Our design specifies a historical comparison cohort of infants exposed to other vaccines and thus we select the Poisson MaxSPRT^{16} to analyze our data, with a requirement that a minimum of four events is needed to signal. Again, we chose this particular combination to mimic prior analyses.^{4} ^{,} ^{17–25} We use a flat boundary and set the overall type-I error to 0.05.

Inputs to the Planning Process: Database Specifics
The following inputs are necessary for each database that may contribute to surveillance: the size of the subpopulation of interest, an estimated mathematical function that describes the adoption/uptake pattern, the data refresh delay time, the data processing delay time, and any exposure and outcome misclassification estimates. First, subpopulation size can be ascertained with simple database queries.^{26} Second, the estimated adoption function may be based on historical data, limited uptake data, or no data at all. Together, these two inputs specify how information time (ie, exposed person-time) is translated into calendar time.

In our example, we assume we have two databases—A and B—and we identify our subpopulation of interest (N) as a cohort of 1-year olds. Database A has 500,000 infants and database B has 1,000,000 infants. Table 1 lists database-specific parameters used in the example. For simplicity, we assume a linear adoption function coincident with the timing of annual well visits (N/52 visits per week), and we allow for a proportion of parents (5%) to refuse vaccination for their infant.

The remaining three inputs—the refresh delay time, the processing delay time, and exposure and outcome misclassification estimates—allow the investigator to model how exposures and outcomes appear to the investigator conducting surveillance, mimicking the near-real time aspects of surveillance. The refresh delay time is the frequency with which a participating data partner renews their dataset and makes it available for analysis. The processing delay time is the time that elapses between when an exposure or outcome occurs and when it is recorded and available for analysis. The misclassification estimates—data on the sensitivity and specificity of the exposure and outcome classifications—allow the investigator to model data noise.^{27–29} We hold the delay parameters constant across the databases, assuming a uniform 1-week refresh delay and an 8-week processing delay. For simplicity, we assume no known or estimated misclassification.

Step 1: Performing the Sample Size Calculation
We return with our vaccine example to Figure 1 and assume that we are interested in detecting effect sizes of 2 excess events/10,000 person-years (ie, 1 excess event/182,500 doses). Initially, we select a desired power of 0.90 to detect this effect size (depicted as a star in Figure 1 ), and therefore, we set the maximum sample size to 50,000 exposed person-years. Should the true effect size be equal to the effect size we seek to detect, we expect to end surveillance by signaling at the median sample size of ∼16,500 exposed person-years. If there is no excess risk, we expect to end surveillance when 50,000 exposed person-years are observed.

Step 2: Estimate Database-specific Adoption and Exposure Patterns of the Medical Product
In our example, (0.95 × N)/52 infants are vaccinated each week. After an infant has been vaccinated and the 5-day induction period has elapsed, the exposed infant becomes “at risk” for vaccine-associated encephalitis/meningitis. The exposed infant remains in the risk window until: (1) the infant experiences the event, 2) the risk window elapses at 15 days postvaccination, or 3) the surveillance ends. As expected, because database B is twice as large as database A, database B accumulates twice the exposures at any given time during surveillance.

Once we estimate the pattern of information accrual in calendar time, we must alter this pattern to reflect when the data become accessible for analysis by incorporating the refresh delay time, processing delay time, and exposure misclassification parameters. The output of this step is a complete translation of information time into calendar time for each database.

Step 3: Aggregate Exposures and Identify theTwo Stopping Points of Surveillance
Given the database-specific inputs, we aggregate these datasets to consider each potential configuration of databases. Using these aggregated datasets, we identify two calendar times of interest based on the information time concepts in Step 1: the calendar time to reach the maximum sample size and the median sample size.

As expected, the largest database configuration (ie, A+B) ends surveillance most quickly. If we reach the maximum sample size, then surveillance ends after 211 weeks, 110 weeks, or 77 weeks, respectively, for A only, B only, and A+B. These are the calendar times required for each configuration to accumulate 50,000 exposed person-years. If the true effect size is equal to 2 events/10,000 person-years, then the current database configurations may signal at median times of 77 weeks, 44 weeks, and 33 weeks, respectively. These are the calendar times required for each configuration to accumulate 16,500 person-years.

Based on these calendar times, we eliminate database A alone as a candidate configuration and consider whether the remaining options (ie, B alone and A+B) will meet our needs. Specifically, can we accept a 2-year window for surveillance when there is no association and a 1-year window to detect a moderate-to-severe elevated risk? Initially, we presume the answer is yes and proceed to step 4. Should no configurations prove usable—perhaps due to the inability to generate the required sample sizes—it is possible to return to step 1 and reiterate with a new sample size target.

Step 4: Decision Analysis with Uncertainty
The goal of the decision analysis is to choose the optimal database configuration among the remaining candidates. We do this by estimating the surveillance costs—in public health dollars and in public health outcomes—under the two configurations. For these purposes, we assume each database is associated with a fixed cost, regardless of the length of surveillance. We also consider how uncertainty about the true effect size affects these outcomes. Thus, for each configuration, we consider the costs under two scenarios: when there is an elevated risk and when there is not.

If there is an elevated risk, then the delay in signal detection from using a smaller configuration results in avoidable excess events. We project the impact of this delay on a national scale, using the median sample size estimates in step 3, and calculate the additional doses that might occur nationwide during the delay. We assume 10,000 children are vaccinated daily based on a population of ∼4 million children. An 11-week delay results in 770,000 additional doses and 4.2 excess events at the effect size of interest (1 excess event/182,500 doses). Thus, when an elevated risk exists, the larger configuration is more expensive but eliminates avoidable excess events. The smaller configuration saves money at the cost of avoidable excess events.

When there is no excess risk, there are no avoidable excess events under any database configuration. A smaller configuration saves money but requires an additional 33 weeks to end surveillance. Mathematically, a multicriteria decision analysis under uncertainty^{30} ^{,} ^{31} can be used to weight these outcomes, and the optimal decision can be identified based on the investigator’s objectives. (An in-depth examination of these decision analytic models is beyond the scope of this article.)

ENHANCED TIME-TO-SIGNAL ANALYSIS
In the planning process, we used median sample size as a summary statistic to describe a complete distribution. Investigators may prefer a different summary statistic (such as the mean or 80th percentile) to plan for a calendar time-to-signal or may wish to use the complete probabilistic distributions to weigh competing database configurations. To generate complete distributions, we simulate the arrival of outcomes based on the exposure patterns established in step 2. We perform this simulation for a range of true effect sizes and perform sequential database surveillance on these simulated datasets, resulting in one distribution of the “calendar time to end surveillance” for each effect size. Figure 2 depicts the distribution in which the true effect size is equal to 2 excess events per 10,000 person-years. The sawtooth pattern occurs because the Poisson model requires an end to surveillance coincident with the arrival of an observed event, and each hill represents a discrete number of events. The mistaken acceptance of the null (ie, type-II error) is depicted by the rightmost column and occurs when the maximum sample size of 50,000 exposed person-years is accumulated.

FIGURE 2: Histogram of calendar time to end surveillance when incidence rate difference = 2 excess events/10,000 person-years. Number of simulations = 100,000. Top panel is configuration 1 (database B); bottom panel is configuration 2 (database A+B). “o” is the median, “x” is the mean, “*” is the 80th percentile. The large rightmost column represents scenarios in which the maximum sample size has accrued and the null hypothesis was not rejected. Overall type-I error: 0.05.

For comparison, Table 2 depicts alternative summary statistics of these complete distributions over a range of effect sizes, specifically the mean and 80th percentile. The italicized values in Table 2 indicate instances in which the maximum sample size has been reached, and we have failed to reject the null.

TABLE 2: Descriptive Statistics of the Calendar Time to End Surveillance for Two Database Configurations (B and A+B) for Various True Risksa

DISCUSSION
Public health investigators can optimize resources by using a structured framework to consider multiple options for implementation of sequential database surveillance. We present a visual tool to examine tradeoffs in sample size calculations for sequential statistical models (Figure 1 ). Second, we suggest a four-step planning process to illustrate investigator choices among database configurations. The process and its underlying model components are flexible enough to apply generally to sequential database surveillance activities. Still, there is significant additional modeling work to be done for more complex problems. For example, sample size calculations may be produced with other sequential statistical models including observational group sequential models,^{32–35} which imply different sequential boundaries. The tradeoffs of these models and their associated boundaries are discussed elsewhere.^{8} ^{,} ^{9} ^{,} ^{32} ^{,} ^{33} Logistically, it is important to consider whether a database’s refresh delay is compatible with the assumptions of the sequential statistical model. For example, continuous sequential models^{16} would be poorly matched with long refresh delays in which many outcomes were expected after each refresh.

Also, there is substantial uncertainty associated with parameters describing adoption (ie, both the percentages of nonadopters and the function that defines adoption). In the example of routine childhood vaccination, a linear adoption function is not unreasonable because of requirements for school attendance, daycare, etc. However, adoption of new drugs is influenced by more factors (eg, formulary policy, copayments, the availability of substitute therapies, treatment guidelines, etc.), making adoption patterns considerably more complex and unstable. Also, we chose a deterministic model to develop exposure patterns because of the large sample sizes being considered and the relative unimportance of heterogeneity. A stochastic model with more individual-level detail could be used.

We greatly simplified the concept of processing delay. Exposure data are sometimes available sooner than outcome data, and outcome data may have differing lag times based on their origin (eg, ambulatory encounter data may become available more quickly than inpatient data).^{20} These incoming datastreams may be modeled explicitly, and the additional modeling efforts are likely worthwhile when calendar time for surveillance is very short (ie, influenza vaccination surveillance). Alternatively, one can use the maximal processing delay across all incoming datastreams as the input parameter. Allowing this “data settling” period also may mitigate issues associated with latent data correction and potential instability in sequential database surveillance.

We have not considered measurement error in our example. Perfect measurement is unrealistic when using electronic administrative data.^{36–40} Well-established corrective procedures exist when exposure and outcome misclassification are nondifferential,^{27–29} and investigators may adjust their effect sizes of interest accordingly. Differential misclassification requires more modeling effort and prior knowledge.

To illustrate our process, we made conventional choices regarding the permissible type-I error, and we set the null hypothesis to be equivalent to “no excess risk.” Investigators may wish to customize these settings, particularly in light of the context of surveillance. For example, in some surveillance settings the baseline might be set at 10% excess risk (a signal occurs if there is evidence of more than 10%).

To compare systems, we chose to compare the median sample size (or time to detect a signal) and then, in a second step, we focused on the number of excess events that occur as a result of this detection time. Others^{41} ^{,} ^{42} have incorporated these ideas into a single unifying metric, the event-based performance, which accounts for tradeoffs between type-I and type-II error but uses exposed events rather than excess events. The event-based performance metric cannot currently be calculated with our simulation strategy, but may be possible with further development.

In summary, we present a four-step planning process for sequential surveillance of an exposure–outcome pair. Additional modeling support may be needed for sample size calculations using other sequential statistical methods, adoption functions, data delays, and misclassification errors. Such models can help to identify the most efficient use of public health dollars for medical product surveillance.

ACKNOWLEDGMENTS
We thank Richard Platt and Gerald Dal Pan for their helpful comments and suggestions.

REFERENCES
1. Baggs J, Gee J, Lewis E, et al. The Vaccine Safety Datalink: a model for monitoring immunization safety. Pediatrics. 2011;127(suppl 1):S45–S53

2. Platt R, Carnahan RM, Brown JS, et al. The U.S. Food and Drug Administration’s Mini-Sentinel program: status and direction. Pharmacoepidemiol Drug Saf. 2012;21(suppl 1):1–8

3. Nguyen M, Ball R, Midthun K, Lieu TA. The Food and Drug Administration’s Post-Licensure Rapid Immunization Safety Monitoring program: strengthening the federal vaccine safety enterprise. Pharmacoepidemiol Drug Saf. 2012;21(suppl 1):291–297

4. Yih WK, Lee GM, Lieu TA, et al. Surveillance for adverse events following receipt of pandemic 2009 H1N1 vaccine in the Post-Licensure Rapid Immunization Safety Monitoring (PRISM) System, 2009–2010. Am J Epidemiol. 2012;175:1120–1128 Available at:

http://aje.oxfordjournals.org/content/early/2012/05/10/aje.kws197 . Accessed 24 May 2012.

5. Yih WK, Kulldorff M, Fireman BH, et al. Active surveillance for adverse events: the experience of the Vaccine Safety Datalink project. Pediatrics. 2011;127(suppl 1):S54–S64

6. Whitehead J The Design and Analysis of Sequential Clinical Trials. 1997 Chichester, UK: Wiley

7. Jennison C, Turnbull BW Group Sequential Methods With Applications to Clinical Trials. 2000 Boca Raton, FL Chapman & Hall/CRC

8. Nelson JC, Cook AJ, Yu O, et al. Challenges in the design and analysis of sequentially monitored postmarket safety surveillance evaluations using electronic observational health care data. Pharmacoepidemiol Drug Saf. 2012;21(suppl 1):62–71

9. Kulldorff MStrom BL, Kimmel SE, Hennessy S. Sequential statistical methods for prospective postmarketing safety surveillance. Pharmacoepidemiology. 20115th ed John Wiley & Sons:852–867

10. Maro JC, Brown JS. Impact of exposure accrual on sequential postmarket evaluations: a simulation study. Pharmacoepidemiol Drug Saf. 2011;20:1184–1191

11. Coloma PM, Trifirò G, Schuemie MJ, et al.EU-ADR Consortium. Electronic healthcare databases for active drug safety surveillance: is there enough leverage? Pharmacoepidemiol Drug Saf. 2012;21:611–621

12. Fireman B, Toh S, Butler MG, et al. A protocol for active surveillance of acute myocardial infarction in association with the use of a new antidiabetic pharmaceutical agent. Pharmacoepidemiol Drug Saf. 2012;21(suppl 1):282–290

13. Kittelson JM, Emerson SS. A unifying family of group sequential test designs. Biometrics. 1999;55:874–882

14. Emerson SS, Kittelson JM, Gillen DL. Frequentist evaluation of group sequential clinical trial designs. Stat Med. 2007;26:5047–5080

15. Lan KKG, Demets DL. Discrete sequential boundaries for clinical trials. Biometrika. 1983;70:659–663

16. Kulldorff M, Davis RL, Kolczak M, Lewis E, Lieu TA, Platt R. A maximized sequential probability ratio test for drug and vaccine safety surveillance. Seq Anal. 2011;30:58–78

17. Tse A, Tseng HF, Greene SK, Vellozzi C, Lee GM. Signal identification and evaluation for risk of febrile seizures in children following trivalent inactivated influenza vaccine in the Vaccine Safety Datalink Project, 2010–2011. Vaccine. 2012;30:2024–2031

18. Gee J, Naleway A, Shui I, et al. Monitoring the safety of quadrivalent human papillomavirus vaccine: findings from the Vaccine Safety Datalink. Vaccine. 2011;29:8279–8284

19. Lee GM, Greene SK, Weintraub ES, et al.Vaccine Safety Datalink Project. H1N1 and seasonal influenza vaccine safety in the vaccine safety datalink project. Am J Prev Med. 2011;41:121–128

20. Greene SK, Kulldorff M, Yin R, et al. Near real-time vaccine safety surveillance with partially accrued data. Pharmacoepidemiol Drug Saf. 2011;20:583–590

21. Belongia EA, Irving SA, Shui IM, et al.Vaccine Safety Datalink Investigation Group. Real-time surveillance to assess risk of intussusception and other adverse events after pentavalent, bovine-derived rotavirus vaccine. Pediatr Infect Dis J. 2010;29:1–5

22. Greene SK, Kulldorff M, Lewis EM, et al. Near real-time surveillance for influenza vaccine safety: proof-of-concept in the Vaccine Safety Datalink Project. Am J Epidemiol. 2010;171:177–188

23. Yih WK, Nordin JD, Kulldorff M, et al. An assessment of the safety of adolescent and adult tetanus-diphtheria-acellular pertussis (Tdap) vaccine, using active surveillance for adverse events in the Vaccine Safety Datalink. Vaccine. 2009;27:4257–4262

24. Lieu TA, Kulldorff M, Davis RL, et al.Vaccine Safety Datalink Rapid Cycle Analysis Team. Real-time vaccine safety surveillance for the early detection of adverse events. Med Care. 2007;45(10 suppl 2):S89–S95

25. Brown JS, Kulldorff M, Petronis KR, et al. Early adverse drug event signal detection within population-based health networks using sequential methods: key methodologic considerations. Pharmacoepidemiol Drug Saf. 2009;18:226–234

26. Curtis LH, Weiner MG, Boudreau DM, et al. Design considerations, architecture, and use of the Mini-Sentinel distributed data system. Pharmacoepidemiol Drug Saf. 2012;21(suppl 1):23–31

27. Mullooly JP. Misclassification model for person-time analysis of automated medical care databases. Am J Epidemiol. 1996;144:782–792

28. Brenner H, Gefeller O. Use of the positive predictive value to correct for disease misclassification in epidemiologic studies. Am J Epidemiol. 1993;138:1007–1015

29. Green MS. Use of predictive value to adjust relative risk estimates biased by misclassification of outcome status. Am J Epidemiol. 1983;117:98–105

30. Yoon K, Hwang CL. Multiple Attribute Decision Making: An Introduction. 1995 Thousand Oaks, CA Sage Publications

31. Keeney RL. Decision analysis: an overview. Oper Res. 1982;30:803–838

32. Cook AJ, Tiwari RC, Wellman RD, et al. Statistical approaches to group sequential monitoring of postmarket safety surveillance data: current state of the art for use in the Mini-Sentinel pilot. Pharmacoepidemiol Drug Saf. 2012;21(suppl 1):72–81

33. Nelson JC, Cook AJ, Yu O, Zhao S, Jackson LA, Psaty BM. Methods for observational post-licensure medical product safety surveillance. Stat Methods Med Res. 2011 Available at:

http://www.ncbi.nlm.nih.gov/pubmed/22138688 . Accessed 1 August 2012.

34. Li L. A conditional sequential sampling procedure for drug safety surveillance. Stat Med. 2009;28:3124–3138

35. Zhao S, Cook A, Jackson L, Nelson J. Statistical performance of group sequential methods for observational post-licensure medical product safety surveillance: a simulation study. Stat Its Interf. 2012;5:381–390

36. Ray WA. Improving automated database studies. Epidemiology. 2011;22:302–304

37. Manuel DG, Rosella LC, Stukel TA. Importance of accurately identifying disease in studies using electronic health records. BMJ. 2010;341:c4226

38. van Walraven C, Bennett C, Forster AJ. Administrative database research infrequently used validated diagnostic or procedural codes. J Clin Epidemiol. 2011;64:1054–1059

39. Strom BL. Methodologic challenges to studying patient safety and comparative effectiveness. Med Care. 2007;45(10 suppl 2):S13–S15

40. Schneeweiss S, Avorn J. A review of uses of health care utilization databases for epidemiologic research on therapeutics. J Clin Epidemiol. 2005;58:323–337

41. Gagne JJ, Rassen JA, Walker AM, Glynn RJ, Schneeweiss S. Active safety monitoring of new medical products using electronic healthcare data: selecting alerting rules. Epidemiology. 2012;23:238–246

42. Gagne JJ, Walker AM, Glynn RJ, Rassen JA, Schneeweiss S. An event-based approach for comparing the performance of methods for prospective medical product monitoring. Pharmacoepidemiol Drug Saf. 2012;21:631–639