Norovirus is a highly-infectious gastrointestinal pathogen that affects all age groups.^{1} Investigations of primary point-source outbreaks, therefore, often focus on secondary cases.^{2,3} Households constitute a particularly important site of these secondary cases, as living in close proximity facilitates a higher effective rate of contact, particularly for diseases where the fecal-oral route is important to transmission. This household transmission contributes to overall disease burden, and individuals infected at the household level may generate infections in the community that result in new point-source outbreaks that infect many people at one time.

From 1997 through 2002, norovirus was responsible for 93% of nonbacterial gastroenteritis outbreaks in the United States.^{4} The high incidence of norovirus is attributable both to its low infectious dose^{1} and its ability to survive in the environment.^{5} As a leading cause of gastroenteritis worldwide,^{6} norovirus is an important concern for local public health departments as well the US Environmental Protection Agency (EPA). It is important, therefore, to develop effective intervention and control strategies for norovirus and similar pathogens. These require both reliable estimates of household transmission parameters and effective analytic tools for obtaining these estimates.

Although there have been studies of community norovirus outbreaks,^{7} there are no studies that quantify transmission dynamics in the community using a dynamic model. One of the difficulties of these studies is that we often observe only the time of symptom onset for infectious cases. Unobserved events typically include infection and recovery and the times at which these occur. Properly describing the transmission dynamics in household systems necessitates the use of mechanistic models that account for unobserved state variables (eg, the number of infectious and susceptible individuals at any given time), and the more pronounced random variability in outbreaks in small populations.

In this paper, we develop tools to address these challenges and analyze household data collected subsequent to a norovirus outbreak. Götz et al^{8} followed a series of 153 households exposed to norovirus after a 1999 point-source, food-borne outbreak within a network of daycare centers in Stockholm, Sweden. For each of these households, one person (the household index case) was infectious and symptomatic due to the point-source outbreak, and the time of symptom onset for all subsequent cases was recorded. We denote each of these case sequences as a time series.

We analyze these outbreak data using a dynamic model, and obtain maximum likelihood estimates of the household transmission parameter, β, and the average duration of infectiousness, 1/γ, where γ is the mean daily rate of recovery from infectiousness. We find that the observation of multiple household time-series may provide enough information to mitigate the absence of observed infection times, infectious periods and household sizes.

## METHODS

### Data

Illness data were obtained from a published study of a food-borne norovirus outbreak in 30 daycare centers in Stockholm, Sweden in 1999.^{8} The origin of this outbreak was a single food-service worker who shedded norovirus while preparing lunches that were distributed from a central location to 30 daycare centers throughout Stockholm. At the time of the outbreak this worker was infectious but had no overt symptoms.

Among 775 subjects surveyed after the outbreak, 195 cases of gastroenteritis were identified, 176 as norovirus. Among those subjects with norovirus infections, 23 lived alone, 49 lived in households where transmission occurred, and 104 lived with one or more persons but with no observed transmission. Nineteen subjects were excluded because they lived in households with pre-existing cases of gastroenteritis at the time of the outbreak. The primary dataset used in this analysis consists of time series from the 153 exposed households with 2 or more members.

Data were collected retrospectively for the 9 days following the onset of symptoms in index cases. The data consist of the times that cases became symptomatic, reported to the nearest 12 hours and normalized (with the onset of symptoms in the index case set to time zero). Stool samples were collected from 5 symptomatic individuals, and the presence of norovirus was confirmed via electron microscopy. Remaining cases were diagnosed based on a norovirus screening interview and a confirmed exposure to a household member infected at the point-source event. Figure 1 provides a visual depiction of the household time-series data for exposed households with secondary cases (modified from the paper by Götz et al,^{8}Fig. 5).

When describing household transmission dynamics, we assume that the onset of symptoms corresponds to the beginning of the infectious period. This is supported by a controlled norovirus dosing trial in which early shedding in the absence of symptoms occurred primarily in persons who never became symptomatic.^{9} Our model also allows the infectious period to be longer than the symptomatic period, which is typical of norovirus infections.^{9,10}

In addition, we estimate the distribution of the incubation period, using data reported for the Stockholm outbreak^{8} on the time lag between the point-source event and the onset of symptoms in the 153 household index cases. A gamma distribution with mean 1/ε, and shape parameter ε_{s} was fit to these incubation time data by maximum likelihood (1/ = 1.7 days; _{s} = 3.73 [SE = 0.048]) (Fig. 2). To fit the assumptions of the compartmental transmission model described in the following section, we round the estimated shape parameter to the nearest integer. However, our estimation approach is robust to models with arbitrarily-distributed infectious periods.

When estimating parameters of the infection-process model, we characterize the infectious period as gamma distributed with an unknown mean and shape parameter. Household sizes were not reported in the original outbreak dataset. To address this missing data issue, census data on the distribution of Swedish household sizes during the study period were incorporated into our analysis.

Because the Stockholm outbreak data include only the time of symptom onset, we are unable to directly estimate the rate at which asymptomatic infections were created. Accounting for asymptomatic infections is important, as they have been estimated to comprise from 12% to 50% of norovirus cases.^{11–14} Additional analysis was conducted to assess the impact of increasing levels of asymptomatic infection on our results.

### Model

We treat the household infection process as a continuous-time Markov chain, where persons can be in one of 4 states: susceptible (S), exposed/incubating (E), symptomatic/infectious (I) and recovered (R) (Fig. 3). The daily transmission rate, β, is defined as rate of contact at time *t* multiplied by the probability that contact between a susceptible and an infected person results in transmission. We account for the baseline risk of community and environmental infection through the parameter α, which is measured in terms of the daily risk of infection per susceptible. The incubation and infectious periods are assumed to follow gamma distributions, where each is defined by a mean duration (1/ε, 1/γ) and shape parameter (ε_{s}, γ_{s}). The shape parameters for the distributions of the incubation and infectious periods can be thought of as the number of stages that persons pass through before they are either infectious or recovered, respectively. These stages are represented by the first-order compartments in Figure 3.

At any given time, *t*, the hazard, ω_{t}, to each susceptible in a household is defined by the force of infection,

where *I*_{t} denotes the total number of infectious persons in a household at time *t*. Consistent with a Poisson process, we assume that these waiting times are exponentially distributed with mean 1/ω_{t}. Under these assumptions, the probability of observing one or more infections over this interval Δ*t* is the exponential cumulative distribution function.

The classic model for infectious disease dynamics is the flow of hosts among various compartments defined as susceptible, exposed but not infectious, infectious, and recovered (SEIR). To generate sample data for evaluating the statistical method described in the next section, we use the force of infection (Eq. 1), gamma-distributed incubation and infectious periods, and household sizes drawn from the census distribution in a stochastic SEIR simulation model. Implementation details are available in the supplementary materials.

### Data Model

First, we define a likelihood function for an infection time series when all 4 individual states (susceptible, exposed/incubating, infectious, and recovered) are observable, and only the transmission parameters β and α are unknown. We then outline a data augmentation method^{10} that allows us to extend this likelihood function to the case in which some states are unobserved (Fig. 4).

### Likelihood

The household time series is described as a series of system states, *q*_{ij} = {*S*_{ij},*E*_{ij},*I*_{ij},*R*_{ij}}, for each household, *i*, and state, *j*, where N_{Q} is the number of distinct system states in a household time series and *Q*_{i} = {*q*_{i,0}…*q*_{i},_{NQ}} is the entire set of states in a household in chronological order (Fig. 4). Beginning times for each system state are denoted *t*_{ij}. Three state transitions are possible: infection, onset of symptoms (and infectiousness), and recovery. The states of the system immediately before the occurrence of infection events, where infection is defined as a transition into E, are indexed by *k* and denoted as *v*_{ik} ∈*V*_{i}, where *V*_{i} ⊂ *Q*_{i}. The number of infections in a household observation is N_{K}.

With this notation, *q*_{i}_{,0} corresponds to the state of household *i* immediately after the onset of symptoms in the index case, and *v*_{i}_{,0} corresponds to the state of the household immediately before the first household infection.

Assuming that the times of infection, symptom onset, and recovery are known, we can formulate the household likelihood function as the product of 2 terms: (1) the likelihood of observing no new cases during the Δ*t* between all state transitions (ℓ_{a}) and (2) the likelihood of infection at the time when new infection events are observed (ℓ_{b}).

The expected number of new infections for a given household, *i,* at state *j*, is given by:

The first term, ℓ_{a}, is the probability of observing no infections over all of the time intervals between states:

The second, ℓ_{b}, describes the joint likelihood of all observed infection events, ie, the product of all instantaneous infection probabilities at times when infection events are observed:

Based on these definitions, the likelihood of the data for household *i*, given β and α, is:

The product of the likelihoods for all observed households is taken to be the likelihood of the entire observed outbreak, *O:*

### Data Augmentation

The observed data consist of the times of symptom onset in new cases, represented by increments to the household infectious-state variable *I*_{i,} and, by consequence, decrements to the state variable *E*_{i.} We do not observe infection events for household cases; this is represented by an increment to the household incubating state *E*_{i} and a decrement in the number of susceptibles *S*_{i}. We also do not observe recovery from infectiousness, represented by an increment to the household immune state *R*_{i} (and decrement in *I*_{i}). Because all states are necessary to characterize the transmission dynamics of the system, but only transitions into state *I* are observed, a method is needed to evaluate the likelihood. To address this missing-data problem, we generate an augmented household time series by sampling from our incubation and infectious period distributions (mean, shape = 1/ε,ε_{s} and 1/γ,γ_{s}, respectively) for each case, as described by Cooper et al.^{15} We account for right-censoring by following the convention that all recovery times greater than the observation period, *t*_{f}, are truncated to be equal to *t*_{f}. This returns the correct likelihood of the data when sampled recovery times are outside the observation window. In this way, we create an outbreak realization with all states accounted for. Using this augmented dataset, we can calculate the likelihood. We repeat this process many times, resampling new times from the distributions and calculating a new likelihood each time. The mean of this set of sampled likelihoods approximates the true likelihood of the household time series. This procedure is equivalent to Monte Carlo numerical integration with importance sampling^{16} and is depicted visually in Figure 4. (See papers by Rampey et al^{17} and Rhodes^{18} for alternative approaches to estimating transmission parameters with this type of data.)

We obtain a likelihood estimate for an entire outbreak by augmenting all households 10^{4} times and estimating their joint likelihood (Eq. 7). Because we are sampling incubation and infectious periods proportionally from their joint distribution, the expectation of this set of likelihoods approximates the likelihood of the data, given the parameters vector θ = {α,β,1/ε,ε_{s},1/γ,γ_{s}}.

In the Stockholm outbreak dataset, the number of people in each household is unobserved. We account for these missing data with household size data obtained from a national census^{19} and combine this with information from the household observations; the number of household members must be equal to or greater than the number of observed cases. We combine the census distribution with this lower bound on size for each household to construct a conditional distribution of sizes for each household. When an augmented household time series is generated, a size is sampled from this distribution, allowing us to incorporate and bound our uncertainty regarding household sizes when estimating the likelihood. In the following section we will demonstrate that this does not have a significant negative impact on our results. For details on the implementation of the data augmentation procedure, see the eAppendix (http://links.lww.com/EDE/A400).

The Table lists the 2 parameterizations used in the analysis. Parameter set 1 uses case and incubation-period data from the Stockholm outbreak. We estimate the transmission parameter, β, as well as the mean, 1/γ, and shape parameter γ_{s} of the distribution of the infectious period. We constrain our parameter search to values of 1/γ >1 day, as durations of symptomatic shedding less than 1 day are biologically implausible.^{10,11} Parameter set 2 consists of the population parameter values of a single 153-household outbreak realization from the stochastic model, with household sizes drawn from the census distribution. With these simulated data, we estimate β and 1/γ under 2 conditions: known household sizes and unknown household sizes.

## RESULTS

Figure 5 contains the maximum likelihood estimates and confidence intervals of both the main transmission parameter ( = 0.14 [95% confidence interval {CI} = 0.08–0.24]; Fig. 5A) and average duration of infectiousness (1/ = 1.17 days [1.00–1.88]; Fig. 5B) for the Stockholm outbreak. We also estimated the shape parameter for the duration of infectiousness (γ_{s} = 1.0 [1.0–2.0]; not pictured). Figure 6 is a contour plot showing a 2-dimensional likelihood profile with respect to β and 1/γ. Each cell contains the likelihood corresponding to the optimized value of γ_{s} for each (β,1/γ) pair. We also estimate the parameters when α = 0.01 and obtain similar results ([ = 0.13 [0.07–0.22]; 1/ = 1.0 days [1.0–1.33]; γ_{s} = 1.0 [1.0–2.0]; not pictured). Thus there is likely some bias in our estimated beta due to environmental infection, but this bias is small.

To examine the impact of unknown household sizes, we created a simulated dataset with parameters β = 0.14 (transmission rate), α = 0.001 (background transmission rate), 1/ε = 1.5 days, ε_{s} = 4.0, (incubation period), 1/γ = 1.17 days, γ_{s} = 1.0 (duration of infectiousness) (Table, Parameter Set 2). We then estimated 2 of these parameters, the transmission rate and average duration of infectiousness, under 2 conditions: (1) where actual household sizes are explicitly included in the estimation (dashed line: _{unknownHH} = 0.139 [95% CI = 0.087–0.273], 1/_{unknownHH} = 1.21 days [0.625–1.88], Fig. 7A); and (2) where household sizes are drawn from the census distribution (solid line: _{unkownHH} = 0.133 [0.079–0.259] 1/_{unknownHH} = 1.21 days [0.63–1.88], Fig. 7B).

### Asymptomatic Infection

To understand the impact of unobserved asymptomatic infections, we performed a simulation-based sensitivity analysis that allows us to predict the value of the transmission parameter, β, for varying proportions of asymptomatic infections, τ.

We find that, starting from our maximum likelihood estimate of β = 0.14 when τ = 0, the predicted value of β increases linearly by approximately 0.035 units for each 10% increase in τ (Fig. 8). For further details on the design and implementation of this analysis, see the eAppendix (http://links.lww.com/EDE/A400).

## DISCUSSION

Using a collection of household-exposure and illness-onset time series, we have obtained estimates (and their confidence intervals) for the household person-to-person infection rate and average infectious period for norovirus. We also predict the value of the transmission parameter β as a function of the proportion of asymptomatic infections. We obtained these estimates despite the absence of potentially important data, including infection times, recovery times, and household sizes. The inclusion of census data with household-specific lower bounds (due to the number of observed cases) allowed us to obtain an accurate estimate of household force of infection in the absence of directly observed household sizes.

Although the pattern of contact in households tends to fit the standard mass-action assumption in susceptible-infected-removed models,^{20} their typically small sizes require careful consideration of the influence of random variability on results, obviating the use of deterministic models.^{21,22} This is a topic that has received considerable attention, and there is an extensive literature on techniques for fitting stochastic models to outbreak data^{18,23,24} in a variety of settings (eg, communities,^{25} schools^{26} and households^{27}). Using household-level infection data at the end of an outbreak, Longini et al^{24} generated estimates of household and community parameters for the distribution of final household outbreak sizes. However, because their method was developed to explain final-size data from public health reports and does not use temporal information, it provides only limited insights regarding the interaction between infectivity and the durations of the incubation and recovery periods in outbreak time-series.

Hohle et al^{28} present a technique that could be useful with household time-series data. They use Bayesian inference to estimate transmission parameters in spatially heterogeneous SEIR models, and innovate on previous Markov-chain-Monte-Carlo-based techniques by allowing variability in the incubation period. Two significant drawbacks of Bayesian approaches are that: (1) even when care is taken to use noninformative prior distributions, these priors can condition estimates,^{29} and (2) the results can be difficult to interpret, particularly with respect to reproducibility.^{30} We have presented an alternative, frequentist approach that produces maximum likelihood parameter estimates and allows a straightforward exploration of the likelihood surface.

Community transmission is undoubtedly more complicated than our representation. Fixing the community transmission parameter, α, to a value 2 orders of magnitude smaller than the household transmission parameter, β, makes the strong assumption that the within-household transmission process is dominant. We show that our results are not very sensitive to this assumption, and we argue that the assumption is reasonable with respect to our data because all households in the Stockholm dataset had a known source of exposure—an index case infected by the point-source outbreak—and all secondary cases identified in households occurred in a plausible temporal sequence. A better estimate of the rate of community transmission requires focused attention on the mechanisms behind this process, which is outside of the scope of both our dataset and this paper. This is an important focus for future research. In addition, the data used in this analysis come from only 9 days of observation, resulting in right-censoring. While our inferences for the transmission rate and effective duration of infectiousness in the course of a household outbreak are valid, they are not generalizable to community or regional scales.

Reliable transmission parameter estimates are critical to risk assessments and exploratory modeling for public health policy. The impact of interventions on norovirus prevalence and persistence can be better assessed in a model such as ours that includes realistic feedback in the transmission process and empirically-derived transmission parameters.

Although the analysis presented here focuses on the transmission of an infectious pathogen in a specific epidemiologic and social context, the methods employed are relevant to other problems in epidemiology and medicine, in which unobserved variables strongly affect outcomes. We have focused on unobserved within-host disease states and household sizes, but other important variables, including contact structures and environmental reservoirs, are often difficult to observe or missing from otherwise-useful public-health surveillance data.

For example, social and economic factors are likely to increase within-household transmission of pathogens such as tuberculosis and shigellosis,^{31} by increasing host susceptibility to physical and social stress via mechanisms such as allostatic load and household overcrowding.^{32} Administrative records often include important information on the timing, geographic distribution, and infectious contacts of cases^{33} but because of their focus on immediate control, often lack direct observations of contacts that do not result in infections. Consequently, we lack information on how those who become ill and those who escape infection differ in contact patterns and other factors important in transmission. Our work suggests that case-data missing such information can be combined with reasonable, empirically grounded models of contact structures to yield important and useful insights even in the absence of a full dataset. The next step is to apply this approach to different pathogens in more complicated social settings.

## ACKNOWLEDGMENTS

We thank Meghan Milbrath for helpful input over many drafts as well as Rick Riolo, Michael Bommarito and the University of Michigan Center for the Study of Complex Systems for technical assistance and the use of computational resources.

## REFERENCES

1.Teunis P, Moe CL, Liu P, et al. Norwalk virus: How infectious is it?

*J Med Virol*. 2008;80:1468–1476.

2.Widdowson MA, Glass R, Monroe S, et al. Probable transmission of norovirus on an airplane.

*JAMA*. 2005;293:1859–1860.

3.Tsang O, Wong A, Chow C, et al. Clinical characteristics of nosocomial norovirus outbreaks in Hong Kong.

*J Hosp Infec*. 2008;69:135–140.

4.Atmar RL, Estes MK. The epidemiologic and clinical importance of norovirus infection.

*Gastroenterol Clin North Am*. 2006;35:275–290.

5.Patel M, Hall A, Vinje J, et al. Noroviruses: A comprehensive review.

*J Clin Virol*. 2009;44:1–8.

6.Caul EO. Viral gastroenteritis: small round structured viruses, calciviruses and astroviruses Part I. The clinical and diagnostic perspective.

*J Clin Pathol*. 1996;49:874–880.

7.Lopman BA, Adak GK, Reacher MH, et al. Two epidemiologic patterns of norovirus outbreaks: surveillance in England and Wales, 1992–2000.

*Emerg Infect Dis*. 2003;9:71–77.

8.Götz H, Ekdahl K, Lindbäck J, et al. Clinical spectrum and transmission characteristics of infection with Norwalk–like virus: findings from a large community outbreak in Sweden.

*Clin Infect Dis*. 2001;33:622–628.

9.Atmar RL, Opekun AR, Gilger MA, et al. Norwalk virus shedding after experimental human infection.

*Emerg Infect Dis*. 2008;14:1553–1557.

10.Kirkwood C, Streitberg R. Calicivirus shedding in children after recovery from diarrhoeal disease.

*J Clin Virol*. 2008;43:346–348.

11.Gallimore CI, Cubitt D, du Pleiss N, et al. Asymptomatic and symptomatic excretion of noroviruses during a hospital outbreak of gastroenteritis.

*J Clin Microbiol*. 2004;42:2271–2274.

12.Ozawa K, Oka T, Takeda N, et al. Norovirus infections in symptomatic and asymptomatic food handlers in Japan.

*J Clin Microbiol*. 2007;45:3996–4005.

13.Goller J, Dimitriadis A, Tan A, et al. Long-term features of norovirus gastroenteritis in the elderly.

*J Hosp Infect*. 2004;58:286–291.

14.Parashar UD, Dow L, Fankhauser RL, et al. An outbreak of viral gastroenteritis associated with consumption of sandwiches: implications for the control of transmission by food handlers.

*Epidemiol Infect*. 1998;121:615–621.

15.Cooper B, Medley G, Bradley S, et al. An augmented data method for the analysis of nosocomial infection data.

*Am J Epidemiol*. 2008;168:548–557.

16.Robert CP, Casella G.

*Monte Carlo Statistical Methods*. New York: Springer; 2004.

17.Rampey AH, Longini IM, Haber M, et al. A discrete-time model for the statistical analysis of infectious disease incidence data.

*Biometrics*. 1992;48:117–128.

18.Rhodes PH. Counting process models for infectious disease data: distinguishing exposure to infection from susceptibility.

*J R Stat Soc Series B Stat Methodol*. 1996;58:751–761.

20.Anderson RM, May RM.

*Infectious Diseases of Humans: Dynamics and Control*. Oxford: Oxford Science Publications; 1992.

21.Koopman JS. Modeling infection transmission.

*Annu Rev Public Health*. 2004;25:303–326.

22.Matthews L, Woolhouse M. New approaches to quantifying the spread of infection.

*Nat Rev Microbiol*. 2005;3:529–536.

23.Ionides EL, Breto C, King A. Inference for nonlinear dynamical systems.

*Proc Natl Acad Sci USA*. 2006;103:18438–18443.

24.Longini IM, Koopman JS, Monto A, et al. Estimating household and community transmission parameters for influenza.

*Am J Epidemiol*. 1982;115:736–751.

25.King AA, Ionides EL, Pascual M, Bouma MJ. Inapparent infections and cholera dynamics.

*Nature*. 2008;454:877–880.

26.O'Neill PD, Marks PJ. Bayesian model choice and infection route modelling in an outbreak of Norovirus.

*Stat Med*. 2005;24:2011–2024.

27.Longini IM, Koopman JS. Household and community transmission parameters from final distributions of infections in households.

*Biometrics*. 1982;38:115–126.

28.Hohle M, Jorgenson E, O'Neill PD. Inference in disease transmission experiments by using stochastic epidemic models.

*Appl Stat*. 2005;54:349–66.

29.Press SJ.

*Subjective and Objective Bayesian Statistics*. New York: Wiley; 2003.

30.Lele S, Dennis B, Lutscher F. Data cloning: easy maximum likelihood estimation for complex ecological models using Bayesian Markov chain Monte Carlo methods.

*Ecol Lett*. 2007;10:551–563.

31.Wallace R. A synergism of plagues: “planned shrinkage,” contagious housing destruction, and aids in the Bronx.

*Environ Res*. 1988;47:1–33.

32.House JS, Landis KR, Umberson D. Social relationships and health.

*Science*. 1988;241:540–545.

33.Jones RC, Liberatore M, Fernandez JR, et al. Use of a prospective space-time scan statistic to priorities shigellosis case investigations in an urban jurisdiction.

*Public Health Rep*. 2006;121:131–139.