# Estimates of Sexually Transmitted Infection Prevalence and Incidence in the United States: Time to Embrace Uncertainty

From the *Division of Infectious Diseases, Department of Medicine, School of Medicine, and †Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC

Correspondence: William C. Miller, MD, PhD, MPH, Division of Infectious Diseases, CB#7030 University of North Carolina at Chapel Hill, Chapel Hill, NC 27599-7030. E-mail: bill_miller@unc.edu.

Received for publication January 7, 2013, and accepted January 8, 2013.

What is the prevalence of a specific sexually transmitted infection (STI) in the United States? How many prevalent cases of an STI are there in the United States? What is the incidence rate of an STI in the United States? How many incident cases of an STI occur each year in the United States? These seemingly basic epidemiological questions are, in practice, exceedingly difficult to answer. In this issue of *Sexually Transmitted Diseases*, Satterwhite and colleagues^{1} from the Centers for Disease Control and Prevention attempt to answer these questions for 8 common STIs for the year 2008, updating previous estimates for the year 2000.^{2} In this editorial, we address why the results of this ambitious undertaking must be interpreted with both caution and skepticism.

To estimate prevalences and incidence rates, Satterwhite and colleagues^{1} use several sources of available data and a variety of estimation methods. When possible, prevalence estimates were obtained from the National Health and Nutrition Examination Survey (NHANES), a nationally representative population-based survey. Incidence rate estimates were obtained from a simple formula (incidence rate = prevalence/duration), basic mathematical models, case reports, or practice-based administrative data. Each data source and estimation method has its own set of assumptions, biases, and associated uncertainty. However, Satterwhite et al. present a single-point estimate for each STI with little consideration of the uncertainty, that is, the potential error, in these estimates.

In the decade since the estimates of STI cases and incidence for 2000,^{2} the field of epidemiology has embraced uncertainty. Uncertainty in epidemiological estimates of prevalence, incidence, or number of cases arises from 2 fundamental sources—random error and systematic error (bias).^{3} Random error is a function of sampling variability, or chance, and reflects the precision of an estimate, assuming no systematic error. For example, the 95% confidence intervals provided for the prevalence of chlamydial infection among women 15 to 24 years old (2.26%–4.52%) indicate that if NHANES was repeated an infinite number of times on the same population, the true point estimate would fall between the corresponding confidence limits 95% of the time. To address systematic error, epidemiologists routinely perform sensitivity analyses to evaluate the impact of assumptions, missing data, and other input parameters. Through careful assessment of both random and systematic errors, we can begin to address the critical question—how wrong might our estimates be? This approach acknowledges that no study is perfect, no estimate is exact, and an explicit statement of uncertainty is better than a false sense of accuracy.

The National Health and Nutrition Examination Survey is the basis for 7 of the 8 prevalence estimates.^{1} This national survey is undoubtedly one of the best sources of information we have regarding the prevalence of STIs. However, the pertinent population sample is relatively small, especially when stratified by age. Consequently, many of the prevalence estimates are imprecise. For example, if we consider the prevalence of gonorrhea among men aged 15 to 24 years, the point estimate is 0.32% with a 95% confidence interval of 0.12% to 0.84%. The associated number of prevalent cases was estimated at 67,300. Extrapolating the prevalence confidence interval to the prevalent cases, we can approximate the lower and upper bounds as 25,200 and 177,000, a spread of more than 150,000 that is more than twice the point estimate. This presentation of confidence limits for prevalent cases conveys the uncertainty attributable only to random error expressed in the original prevalence estimate.

Estimates from NHANES are also potentially susceptible to systematic error. Surveys are subject to nonresponse, higher-risk populations are often missed with standard sampling methods, and laboratory assays for the STIs are imperfect. The impact of these potential biases could be evaluated with sensitivity analyses, as has been done with other population-based surveys.^{4–7} Whether considering random error or systematic error, single-point estimates of prevalence and case numbers may obfuscate the potential bias inherent in any survey.

What about the estimates of incidence? The incidence estimates for chlamydial infection and gonorrhea are based on the simple formula: incidence rate = prevalence/duration of disease (a back-calculation from the more common form: prevalence = incidence rate * duration [*P* = IR * *D*]). Is this appropriate? We will consider 3 specific issues: uncertainty and its impact, underlying assumptions for the formula, and appropriateness of the formula for infectious diseases.

When 2 parameters are multiplied (or divided) together, the uncertainty surrounding the estimate is amplified. As noted earlier, the prevalence estimates suffer from considerable uncertainty caused by random and, potentially, systematic error. Is the duration estimate known with precision? Clearly not. Average durations of curable STIs are very difficult to measure. To measure duration, we must know the time of onset of the infection and when the infection either spontaneously clears or is adequately treated. The duration of symptomatic and asymptomatic infections will differ because symptoms will prompt treatment, altering the natural course of the infection. Screening programs will shorten the duration of asymptomatic infections, but by how much is unclear. Even the timing of symptoms during the course of certain STIs is difficult to precisely ascertain.^{8,9} To estimate the duration of chlamydial infection, Satterwhite et al. used a series of calculations (see Supplemental Digital Content http://links.lww.com/OLQ/A59), each with considerable, but unaccounted for, uncertainty. Accounting for the limited precision of duration, in addition to that of the prevalence estimates, is necessary to reflect the true uncertainty of the incidence rate estimates.

Although the *P* = IR * *D* formula is well known, it also has limited application.^{10} The formula requires an assumption of a population at steady state. In other words, the population is assumed to have an equal number of people entering as exiting in a given unit of time. Application of the formula in age strata is more problematic because people constantly age into and out of any given strata. Use of this formula within a specific age stratum will typically violate the steady-state assumption, although the magnitude of the associated bias would require further analyses.^{10}

Sexually transmitted infections such as all infectious diseases are “dependent happenings.”^{11} In chronic diseases for which *P* = IR * *D* is more commonly used, the prevalence has no influence on the incidence rate. In contrast, the dependent nature of infectious diseases leads to a dependence of the incidence rate on the prevalence. If sexual behaviors and networks are similar in 2 populations, a population with a higher prevalence will experience a higher incidence rate than a population with a lower prevalence. An alternative formula for infectious diseases may be expressed as follows: incidence rate = average contact rate * per-contact transmission probability * prevalence.^{11}

This formula captures the dependence of incidence on prevalence, but also reflects the dependence on the contact rate between partners and the probability of transmission in a given sexual encounter. Although theoretically preferable, this formula is challenging to use in practice because estimating contact rates and transmission probabilities is difficult.

Although we have focused the discussion earlier on chlamydial infection and gonorrhea, the estimates for each of the other STIs must also be considered carefully. In at least 1 case, HIV infection, the estimate is provided from a previous publication that notably includes an assessment of uncertainty.^{12} The estimates for the other STIs depend on several assumptions or input parameters that are uncertain at best and questionable at the worst. Examples of some of these assumptions include the following: the rate of unreported syphilis cases is 20%, HPV prevalence in men and women is identical, 50% of hepatitis B is sexually transmitted, all diagnosed trichomoniasis is symptomatic, and 30% of women with trichomoniasis are symptomatic, among others.^{1} In addition, several estimates are derived from relatively old data or reflect a period other than the target year of 2008. In reading these estimates, we encourage the reader to identify every source of uncertainty in each assumption or input parameter and “loosen” the authors’ estimate of prevalence, incidence, or number of cases accordingly.

Given the challenges in calculating the number of cases and incidence rates, what could be done to improve these estimates? First, as noted throughout this editorial, an explicit acknowledgement of the uncertainty surrounding input parameters and, consequently, the estimates themselves would improve their overall usefulness. The uncertainty could be incorporated into the calculations to enable creation of reasonable bounds for each estimate. Second, alternative methods of calculation could be used to triangulate the estimates. For example, a relatively simple calculation could be made based on reported cases for reportable diseases, taking into account estimates of underreporting, repeat infections, and duplication. This calculation could be enhanced if better data were available to estimate underreporting and repeat infections. Finally, mathematical transmission models could be developed for several of the STIs using available data to improve model fit. Key advantages of these models include the ability to incorporate uncertainty in each of the input parameters and graphical representation of the uncertainty surrounding model output.

Estimating the number of STIs in the United States is a useful and important undertaking. However, this exercise must be done recognizing that such estimates are inherently noisy. Rather than ignoring this uncertainty, we should embrace it and attempt to quantify it. With efforts to improve data quality, over time this uncertainty can be reduced but never eliminated. Given the substantial, unquantified uncertainty and strong assumptions in the estimates of Satterwhite et al.,^{1} their absolute numbers must be interpreted cautiously. We encourage readers to consider carefully the error around each estimate before using these numbers in their own work.

## REFERENCES

*Chlamydia trachomatis*infection to pelvic inflammatory disease: A mathematical modelling study. BMC Infect Dis 2012; 12: 187.

*Chlamydia trachomatis*and

*Neisseria gonorrhoeae*to pelvic inflammatory disease: Systematic review of mathematical modeling studies. Sex Transm Dis 2012; 39: 628–637.