# Identification of the Asymptomatic Ratio

From the ^{a}Department of Epidemiology, Johns Hopkins University, Baltimore, MD; ^{b}Division of Biostatistics, University of Minnesota, Minneapolis, MN; and ^{c}Department of Epidemiology, The University of North Carolina at Chapel Hill, Chapel Hill, NC.

Correspondence: Nicholas G. Reich, Department of Epidemiology, W6501, Johns Hopkins University, 615 N Wolfe St, Baltimore, MD 21205. E-mail: nreich@jhsph.edu.

An inapparent infection occurs when a person is infected with a pathogen but remains asymptomatic.^{1} The proportion of inapparent infections among those infected with the disease is commonly called the asymptomatic ratio. Knowing the asymptomatic ratio for a disease allows researchers to turn observed disease incidence into estimates of actual incidence. This process can yield post hoc estimates of how many individuals were infectious during (and immune after) an epidemic. This information, in turn, can guide public health control measures and assist in modeling disease spread. However, estimating the asymptomatic ratio poses difficult challenges.

In a recent report, Wang et al^{2} discuss some of the hurdles to estimating the asymptomatic ratio. They propose a method to account for some complexity in the data, using a generalized linear model. However, the model proposed by Wang and colleagues makes assumptions that are difficult to justify in practice and could yield misleading results. In reviewing their method carefully, we hope to deepen the understanding of this potentially useful and important approach while raising cautions about a few easily overlooked assumptions.

Wang and colleagues^{2} specifically address 2 complications with typical observed data on asymptomatic ratios. First, some cases may develop symptoms that are unrelated to the infection. Second, other cases may be infected with several pathogens or multiple strains of the same pathogen, and determining which pathogen or strain caused the symptoms is infeasible. We raise 2 specific critiques of their model. First, they assume that, within an individual, being symptomatic from one pathogen is an independent event from developing symptoms from a second pathogen. This is hard to justify in practice, and can yield biased estimates of the true asymptomatic ratios. Furthermore, their proposed model is identified only because of this assumption of independence. (A model is not identified, or is unidentifiable, when more than one set of parameter values are equally supported by the observed data.) Second, even when this first assumption is met, the proposed model may provide estimates that are incompatible with reality. Specifically, the model does not impose necessary constraints on the parameters. Therefore, model-estimated asymptomatic ratios and confidence limits could lie outside the natural limits of a proportion (ie, less than 0 or greater than 1). These 2 issues are discussed in detail later in the text.

## IDENTIFICATION BY INDEPENDENCE ASSUMPTION

Wang and colleagues^{2} make 2 crucial assumptions of independence. They assume first “that symptoms caused by infection and symptoms caused by background factors are independent events,” and second “that any 2 kinds of pathogens showing symptoms are independent events.” These are both strong assumptions of independence, and yet they are unlikely to be true in practice, particularly the second one. For example, if a person develops symptomatic illness from one pathogen, that illness may trigger short-term protections—either immunologic^{3,4} or behavioral^{5}—against other pathogens. Furthermore, the independence is assumed conditional on confounders, which may not always be easily measured.^{6}

We propose a general framework that clarifies the implications of these 2 independence assumptions (see Appendix for details). Throughout this paper, we use the word infection to indicate an exposure to a pathogen resulting in an immune response that can be measured at a later time.

Under independence assumptions about the risks of symptom development, the model of Wang et al^{2} will yield identified estimates of the asymptomatic ratio for different pathogens. The identifiability hinges on 2 consequences of the independence assumption. First, the simple multiplicative structure allows the probabilities to be separated and modeled linearly on the log scale. Second, the asymptomatic probabilities for a given individual-risk pairing depend only on the particular pathogen, and are free from interactions with other pathogens.

We describe a simple 2-pathogen scenario to illustrate a limitation of that model. Say that the probability that a person infected with pathogen A will not develop symptoms is 0.5. Similarly, the asymptomatic probability for pathogen B is 0.8. Finally, the probability of not developing symptoms from background factors is 0.9. We assume that the overall probability of not developing symptoms is the minimum of the asymptomatic probabilities—a scientifically plausible alternative to the independence model.

Using this minimum model, we propose a deterministic example as an illustration of the sensitivity of Wang et al's^{2} methods to the true underlying model. A population of 4000 individuals is spread evenly among 4 groups: infected with neither pathogen, infected with pathogen A only, infected with pathogen B only, and infected with both. In each group, the proportion that develops symptoms is equal to the minimum asymptomatic probability of the risks in that group. For example, half of the individuals in the group exposed to both pathogen A and B are asymptomatic because min (0.9, 0.5, 0.8) = 0.5. Using the SAS code (SAS Institute Inc., Cary, NC) provided by Wang et al,^{2} we analyzed the symptom data from this population to estimate the asymptomatic ratios for each pathogen. The estimate of the asymptomatic ratio for pathogen A was 0.59 (95% confidence interval = 0.56–0.61) and for pathogen B was 0.90 (0.87–0.94). Both of these pathogen-specific asymptomatic probabilities overestimate the true asymptomatic ratios (by 17% and 13%, respectively). Estimates for the background probability of developing symptoms were virtually unbiased (0.89 [0.88–0.91]).

## LINEAR CONSTRAINTS ON THE MODEL PARAMETERS

Because the asymptomatic ratio is a probability, predicted values and confidence limits should lie in the range of 0–1. To ensure these logical constraints on the fitted values are met, either parameter constraints must be implemented to force the model to be compatible with reality, or parameters must be estimated after an appropriate transformation. Often, point estimates of probabilities are consistent even when parameter constraints are not implemented,^{7} but the limits of the 95% confidence limit may be of greater concern.^{8} In the setting of log-binomial models, Chu and Cole provide some guidance about implementing these constraints, using a Bayesian approach.^{8}

## CONCLUSIONS

Knowing the asymptomatic ratio for a set of pathogens can help public health officials and researchers better understand infectious disease dynamics. First, knowledge of the asymptomatic ratio allows for more accurate estimation of how many persons in a population were infectious during an outbreak, which informs a population-level force of infection that a susceptible person would experience. Second, it helps establish the level of immunity to disease in a given population. The methods proposed by Wang et al^{2} provide a fertile starting point for obtaining estimates of the asymptomatic ratio while controlling for the presence of multiple other diseases or strains. However, further development is needed to make the model applicable to a wider range of realistic settings and to investigate the sensitivity of the results to the key independence assumptions. Our counterexample illustrates the degree to which results are sensitive to these assumptions.

The estimated asymptomatic ratio could depend on the interaction of population demographics and the disease dynamics over the study period. Consider, for example, a situation in which some persons have underlying frailties that make them more likely to have an immune response to infection and also to develop symptoms once infected. Further, assume that, in a 2-pathogen scenario, one pathogen circulates primarily before the second, and the first pathogen offers cross-protection against the second. Even if the strains were identical in their ability to cause symptoms in a given person, the first strain would appear to be more severe because it would infect a disproportionate number of the frail individuals. This hypothetical example raises some concerns with the generalizability of analyses of the asymptomatic ratio. Knowing the asymptomatic ratio in a specific population may be helpful for predicting future susceptibility and disease dynamics within that population, but extrapolating those results to another time period and another population may lead to erroneous conclusions.

Furthermore, the importance and meaning of the asymptomatic ratio may vary in different infectious-disease scenarios. The definitions adopted by Wang et al and extended in this paper are appropriate for acute viral infections occurring in a time-limited epidemic. However, they do not fit as well for chronic infectious disease conditions such as bacterial colonization. In this setting, a person may have asymptomatic colonization that could lead to transmission but no immune response. Therefore, defining what is meant by words like “infection” and “symptoms” is important, as they may vary across different areas of infectious disease research.

This line of inquiry into asymptomatic or inapparent infection provides several fertile areas for future applied and methodological research and discussion.

## APPENDIX

Assume that we have *n* individuals, indexed by *i* = 1,..., *n*. We have observed whether or not each individual has developed symptoms of a particular disease and with which, if any, strains of this disease this individual has been infected. Because we have observed the strain-specific infection information, we say that each individual has a fixed quantity, *r**, of known factors that may have caused him to develop symptoms. This set of risk factors will always include a “background” rate of developing symptoms (denoted by Wang et al^{2} as *p*).

Let *Yi•* be the indicator of whether individual *i* develops symptoms, where *Yi•*=1 if case *i* is asymptomatic, and 0 if case *i* is symptomatic. Say that

where *φi•* is the marginal probability that the *i*th individual is asymptomatic. This marginal probability is itself a function of all the risk-specific asymptomatic probabilities:

where *φir* is the probability that person *i* does not develop symptoms due to risk *r* and *g*(·) is some function of all of the risk-specific asymptomatic probabilities. For example, *g*(·) could be min_{r} (*φir*). Finally, the *φir* may be generated as

where *Xi•* is a set of *r** indicator variables, denoting whether person *i* was infected with risk *r* (we say that *Xi* _{1} = 1 for all *i*, because everyone was exposed to the background risk; *r* = 1), and *Zir* is a set of possible covariates which may be tied to a specific individual-risk pair, for example, an individual's vaccination status against a particular pathogen. Also, *h*(·) is a nonspecific link function. This model is related to the model of multiple possible infectives for a given symptomatic individual, described by Chu et al.^{9}

The model that Wang et al propose may be seen as a special case of the more general structure just given. Specifically,

and

However, with unspecified *g*(·) and *h*(·), this model is not identified. The independence assumption provides one path to identify the general asymptomatic ratio model, although other specifications may work. For example, if *h*(·) follows the form assumed by Wang et al but *g*(·) is specified as the mean function, this model is also identified. In the presence of data specific to a subject-pathogen pair (for instance a dependency on vaccination status or known cross-reactivity between pathogens), a more complicated functional form for *h*(·) could be assumed. In this scenario, the choice of *g*(·) and *h*(·) functions would determine the identifiability of the model. In general, careful attention is needed to ensure that these types of models are appropriately identifiable.

## REFERENCES

*Infectious Disease Epidemiology: Theory and Practice.*2nd ed. Sudbury, MA: Jones & Bartlett Publishers, Inc.; 2007:25–62.

*Epidemiology*. 2010;21:726–728.

*Science*. 2004;305:371–376.

*J Math Biol*. 1997;35:825–842.

*Proc Biol Sci*. 1998;265:2033–2041.

*J Gerontol A Biol Sci Med Sci*. 2009;64:272–279.

*Int J Epidemiol*. 1998;27:91–95.

*Epidemiology*. 2010;21:855–862.