THE RAPID BUT UNEVEN GLOBAL spread of human immunodeficiency virus (HIV) has challenged researchers to better understand sexually transmitted infection (STI) transmission dynamics and their implications for prevention. The basic components of transmission are captured by the well-known population summary measure, “*R*_{0},” the reproduction rate or reproduction ratio of infection. The reproductive rate is the number of secondary cases generated from a single infective case introduced into a susceptible population.

where τ = the probability of transmission per contact; *c* = the contact rate; and δ = the duration of infection.

If *R*_{0} is <1 the initial infected case produces <1 new infection on average, transmission is below the reproductive threshold and the disease is unlikely to persist.

Early work in the population dynamics of STI focused on the apparent paradox that transmission persisted despite contact rates in the general population that suggested an *R*_{0} below the epidemic threshold. The result was the development of the core group concept, the theory that a small group of people with large numbers of partners and repeated infections can maintain a reservoir of infection within a population even when the average reproduction rate for the population is below threshold.^{1,2} Such insights directed both policy makers and health-care providers to focus on identifying and treating individuals with repeat infections to reduce *R*_{0} within this highly active core group, a strategy that seeks to curtail disease spread without the cost and coordination challenges of broad population-based screening and intervention. Central to these prevention efforts was the ability to use repeat infection as a reliable marker for core group membership, allowing for accurate intervention targeting. Treatment seeking behavior by those repeat infections also created a natural synergy between identification and treatment efforts.

HIV does not fit into the core group framework, in part because there is no reinfection (so no simple marker for risk and no readily identifiable core group), and in part because of the generalized epidemics of HIV in many countries (which suggests transmission is above the reproductive threshold in large sections of the population).

An epidemic is formally defined as generalized once 1% of the general population is infected,^{3} i.e., once it has spread broadly beyond groups with traditionally defined risk factors such as injection drug use, commercial sex, other sexually transmitted infections, or large numbers of sex partners. These markers of risk have typically been used to target prevention strategies, but with HIV prevalence above 30% in some countries (not just their capital cities), the concept of a “core group” begins to lose meaning and relevance. Prevention strategies are therefore being rethought in the context of HIV. If a small core group is not responsible for persistence, then first principles dictate going back to think about the mechanism: the transmission network. The integration of network analytic theory and methods into HIV prevention research has provided a number of new insights into STI population dynamics.^{4}

In the network context, it is also possible to directly calculate *R*_{0} under some circumstances. Under the assumption of random partnership formation given the heterogeneity in contact rates across persons^{5}:

where τ̄ = the average integrated probability of transmission given a partnership; and 1/τ_{c} = the contact network epidemic potential.

The “average integrated transmissibility” is a partnership level measure that represents the likelihood of transmission in a discordant pair, averaged across all such pairs in the population. It is determined by the characteristics of partnerships (e.g., duration, coital frequency), the characteristics of the pathogen (transmissibility), and the characteristics of the host (infectivity and susceptibility). We will refer to it below by the simpler term “transmissibility.” The “contact network epidemic potential” is a population level measure that summarizes the level of connectivity in the contact network. It is determined by the network structure (e.g., the density, distribution, and clustering of partnerships), and its form depends on the model for the network.^{6} As the form of the equation suggests, τ_{c} can also be interpreted as the threshold transmissibility: the transmissibility needed to sustain an epidemic in this particular network. We will refer to these below as “epidemic potential” (1/τ_{c}) and “threshold transmissibility” (τ_{c}).

The key feature of this generalized *R*_{0} is that neither transmissibility nor the contact network alone determine the likelihood of an epidemic; it is the relationship between the 2 that is important. Because of this interdependence, an extremely high level of transmissibility may not induce an epidemic if the epidemic potential of the contact network is sufficiently low. By the same token, very low levels of transmissibility may induce an epidemic if the potential is sufficiently high. An extreme case that has garnered a great deal of attention in the recent literature is the case of so called “scale-free” networks. In this case, the theory defines τ_{c} = 0, so there is no epidemic threshold. Under these conditions any level of transmissibility is sufficient to induce an epidemic with very high probability. The speed and the magnitude of the epidemic will be determined by the level of transmissibility. The scale-free case is, however, a single extreme state of the degree distribution; the theory of network mediated transmission is considerably more general, and more empirically useful.

In this article, we present a statistical framework that allows us to test the fit of alternative models for the degree distribution, estimate the resulting epidemic potential of the network, and derive the epidemic predictions of the models in terms of *R*_{0}. We use these methods to analyze 5 different surveys of the US population. Using multiple surveys of the same population helps to highlight how robust the statistical estimates and epidemic predictions are to subtle changes in the survey sample, questionnaire, and data collection method.

The original expression for *R*_{0} represents the impact of sexual activity via the single homogeneous term, *c*. The simplest generalization of this term is to allow for heterogeneity in contact rates across persons: the population “degree distribution.”^{5,7,8} In a sexual network, the degree, *K*, is the number of sexual partners of a randomly chosen member of the network and the distribution of *K* is referred to as the *degree distribution*. Empirical evidence shows that degree distributions for sexual partnership networks are highly right skewed.^{5} Although the long upper tail is important for transmission, the other feature of these skewed distributions is a very dense lower tail: the modal degree is 1 partner in the last 12 months for nearly all large representative surveys (e.g., Refs. ^{9–16}). In the 5 survey data sets that will be used in our analysis, 64% to 76% of respondents report exactly 1 partner in the last year.

The extreme right skew of these degree distributions has led researchers to explore whether these distributions can be represented by a “power-law” scaling function that is found in many other physical systems. The probability mass function *P*(*K* = *k*) of a degree distribution has power law behavior with scaling exponent ρ if there exist finite constants *c*_{1}, *c*_{2}, and *M* such that 0 < *c*_{1} < *P*(*K* = *k*)*k*^{ρ} < *c*_{2} for *k* > *M*. The relevant characteristic of these models for understanding disease transmission is the scaling parameter (ρ). If this parameter is between 2 and 3, the variance of the degree distribution is theoretically infinite and the distribution is said to be “scale free.” With infinite variance the epidemic potential, 1/τ_{c}, becomes infinite, so the infectivity τ̄ needed for an epidemic approaches zero. That is, an epidemic is a virtual certainty for any pathogen that has a nonzero probability of transmission. Concordantly, the critical vaccination fraction under such circumstances is unity. There is no herd immunity—all members of the population must be successfully vaccinated to eradicate the disease. Under these conditions, only interventions that successfully shut down the high-degree “hub” nodes will restore the epidemic threshold and curtail the diffusion process.^{17}

The scaling parameters have been estimated by several researchers across a range of populations including Uganda the United States, Sweden, Britain, and Zimbabwe. The results have been somewhat inconsistent and often counterintuitive. The results suggest there should be large STI and HIV epidemics in Britain, Zimbabwe, and potentially the United States, but not in Sweden or Uganda.^{5,7,8,18,19} The results are summarized in Table 1. The discontinuity between the model predictions and the empirical data serve as the motivation for the current investigation.

Why do these model estimates fail to predict results consistent with the evidence? It is our contention that the methods employed in these studies have been inadequate in 1 or more of the following ways. First, the amount of error in the tail of the distribution has been underestimated. Second, models for the social process have been bypassed in favor of simple curve fitting. Finally, models that focus exclusively on degree distributions ignore heterogeneous population mixing patterns, so are unlikely to provide informative predictions of epidemic potential. As a consequence, the evidence that power law models adequately fit the degree distributions of empirical sexual networks has been overstated.

The typical approach to estimating the scaling parameter ρ of a power-law model is to fit a regression line through the apparently linear region of a plot of the survival function of the degree distribution plotted against the distribution on double logarithmic axes^{7,18} and the standard error of the estimated slope is used as an estimate of uncertainty in the model. The appeal of this approach is due to its familiarity and simplicity. There is, however, an implicit assumption here that measurement uncertainty is not correlated with degree. The data suggest that this assumption is not met, for 2 reasons. One is that there is little information in the tail of the distribution, since so few people report large numbers of partners. In the data sets we analyze here, for example, <1% of respondents report >10 partners. Another source of error in the tail is response rounding. Above 10 partners there is a tendency for respondents to report in round numbers, and above 20 partners over 80% of the reports are in large round numbers.^{20,21} As a consequence, the precision of the model approximation becomes more dependent on less accurate data as it fits the higher degree values. And because the distribution tail has a disproportionate effect on the slope of any line fit, the scaling parameter estimates tend to be very unstable. Approaches that use data only from the upper tail of the distribution^{8} simply exacerbate the problem. In this context, the simple regression methodology is inappropriate. It yields biased estimates of the scaling parameter, and greatly underestimates model uncertainty.^{22} A more sophisticated form of curve fitting is used in.^{8} They fit a curve to *P*(*K* = *k*) of the form *ck*^{−ρ} using a form of maximum likelihood. It has the disadvantage of neither using a social process model nor adjusting for the selection of the upper tail as is done here.

The curve fit approach raises another problem as well—the assessment of alternative hypotheses regarding the underlying process that gives rise to the observed degree distribution. This kind of statistical evaluation requires a theory of the mechanism of partner acquisition, represented in the form of a stochastic model that can then be fit to the data and compared to alternatives. As others have noted, power law behavior is consistent with a “preferential attachment” mechanism.^{23–26} In the context of sexual partner selection, this would imply that people chose partners with a probability proportional to the partner’s degree: the more partners that person has, the more likely you would be to choose them. Although this may be a plausible mechanism for link formation in the World Wide Web—where the goal is to minimize steps to all other nodes—it is a questionable representation of how people choose their sexual partners. Indeed most social norms would suggest the opposite; that having many partners makes one socially undesirable. In any case, to move this discussion from speculation to empirical assessment, it is necessary to define and test alternative models of the process.

We identify several plausible alternative models and compare how well they fit the data from 5 surveys. In testing several alternative models on several different data sets, we hope to gain some leverage on identifying the range of processes that describe partnership distributions. If one model fits better than the others in a wide variety of cases, this suggests a commonality of processes. If the best-fit model varies from one data set to the next, this suggests that the partnership acquisition process is too complex to capture with a simple model based exclusively on degree. In that case, the reported fits of individual models would most likely be idiosyncratic and not particularly informative. In addition, if different models fit the different measures from the same data set, this would suggest the results are very sensitive to the measure used, and this should limit the inferences made. Finally, the sample data themselves may be inadequate and fail to accurately capture the true degree distribution in the population. Previous research however, has shown remarkable consistency in reported sexual behavior^{27} and there is no a priori reason to believe the data are not representative.

Regardless of which model fits the data best, a good model must also predict the epidemic behavior we empirically observe. The ultimate goal of analyzing degree distributions is to develop a methodology that sheds light on disease transmission dynamics in different populations. So in addition to comparing the fit of each model, we also examine the epidemic potentials predicted by each.^{5} If the best fitting model for the degree distribution still produces nonsensical epidemic predictions, this again implies that a degree-based model is too simple to capture the true underlying transmission networks.

#### Materials and Methods

##### Data

The degree distribution measures we will examine are drawn from questions about the number of partners in the last year. It is worth a brief discussion on the rationale for that choice. The appropriate time slice to use for estimating network epidemic potential is not obvious, an issue often ignored in the literature. Because partnerships have duration and sequence, the true transmission network does not exist during a single time period, but is instead a concatenation of periods that defines the possible infection path, given the duration (and/or stages) of infection. Outside of computer simulations, we generally can not observe this network. Our goal here, however, is to capture the degree distribution this transmission network would have if we could see it. Consider the degree distributions based on lifetime partners and current partners. Both of these choices are useful in certain contexts, but neither is likely to provide a good representation of the underlying transmission network. The use of lifetime partnerships ignores the impact of partnership sequence and dramatically overestimates the density of the transmission network. Current partnerships select disproportionately on long relationships and are likely to underestimate the density of a transmission network.

Using partners over the last year as a proxy for the transmission network is not perfect, but it has some merits. The longer-term partnerships are balanced by observation of some shorter-term partnerships, and the time window is a reasonable compromise for a range of STI infection durations. Treated STIs probably have shorter infection windows than a year, but this may be balanced by some level of asymptomatic and thus longer infections. Untreatable infections often have periods of peak infectivity, like recent infection with HIV,^{28} and the flare-ups of herpes simplex virus. Finally, the yearly degree distribution allows for comparability with previous literature, and it is a time period commonly used in sexual behavior surveys. Further discussion of this issue is given in Ref. ^{59}.

We analyze data from 5 different sexual behavior surveys with probability samples collected in the United States during the 1990s: the National Health and Social Life Survey (NHSLS), the General Social Survey (GSS), the National Survey of Men (NSM) and the National Survey of Women (NSW) and the Behavioral Risk Factor Surveillance Survey (BRFS). A brief overview of the data are provided in Table 2.

The GSS is conducted by the National Opinion Research Center (NORC) and is designed as part of an ongoing program of social indicator research to gather repeated measures on a broad range of data. The GSS uses the NORC national probability sample, which includes all noninstitutionalized English-speaking persons 18 years of age or older living in the United States. The samples are designed to give each English-speaking household in the United States an equal probability of inclusion. The GSS asked questions about the number of sexual partners in the last year in 1988–1991, 1993, 1994, 1996, 1998 and 2000. Pooling across years, there is a total of 16,159 respondents who reported the number of sexual partners in the last year.^{14} GSS response codes for the number of sex partners in the last year are categorical and top-coded (1, 2, 3, 4, 5–10, 11–21, 21–100, 100+).

The NHSLS is a comprehensive survey of the sexual behavior of US adults 18 to 59 conducted in 1992 by NORC.^{29} The survey employed a multistage area stratified probability sample designed to give each household an equal probability of inclusion. The sample size is 3332. Several different measures of the number of partners in the last year are available in this survey: direct response to an interviewer, response to a self-administered questionnaire, and a constructed variable provided by the study investigators that is based on consistency checks across several measures. These 3 measures will allow us to determine whether model fits are robust to small changes in reporting within the same survey.

The NSM was conducted in 1991 and was designed to examine sexual behavior and condom use among young adult men 20 to 39 years old. The survey was based on a multistage, stratified, clustered, disproportionate-area probability sample of households within the contiguous United States.^{15} The sample includes 3321 noninstitutionalized men. Respondents are asked to report the number of vaginal sex partners and anal sex partners in separate questions. There is no way to ascertain how many partners are represented in both categories, so we define the number of partners as the maximum of the 2 categories, which may be lower than the actual number of unique partners. A total of 586 (19%) of the men reported anal sex. Of these, 18 report no vaginal sex partners, and 35 report more anal than vaginal sex partners.

The NSW was also conducted in 1991 and was designed to examine sexual, contraceptive, and fertility behaviors among young adult women 20 to 29 years old. The sample includes 1669 respondents from 2 subsamples. The first subsample (n = 929) consisted of follow-up cases from the 1983 National Survey of Unmarried Women, which surveyed 1314 never-married women between 20 and 29 years of age. The second subsample (n = 740) is from a different probability sample of 20 to 27 year old women of unspecified marital status selected in 1991.^{16} The NSW uses a very similar instrument to the NSM, so the same adjustment strategy is used.

The BRFS is a part of the state-based Behavioral Risk Factor Surveillance System initiated in 1984 by the Center for Disease Control to collect data on risk behaviors and preventative health practices of the US population age >18.^{14} The BRFS includes questions about sexual behavior as part of a supplement started in 1996. States make the decision whether to include the supplement in each year. We use data for the 5 years from 1996 to 2000. The number of states electing to include the supplement during this period varied from a low of 2 in 1996 to a high of 25 in 1997. The variation makes it impossible to aggregate these data into a true national probability sample. We did not want exclude the BRFS since it has a large sample size (n = 72,280), but the results should be interpreted with care. The variable for the number of sex partners in the last year is top coded at 76+. Only 3 respondents reported the top-coded value.

In order to make these data comparable, we restrict our focus to the age range common to all surveys, 20 to 39. In addition, each of the data sets is resampled with an adjusted weight that combines both the post stratification weight and a renormalizing factor so that all of the data sets have the same age distribution in this range as the United States in 2000. Since the variable of interest here is partners in the last year, rather than partners over the lifetime, the restriction to the 20 to 39 range is likely to understate the fraction of the overall population with 0 or 1 partner. Data from the GSS, for example, indicate that only 8% of the population in this age range had no partners in the last year compared to 30% of respondents outside this range. Similarly, 20% of respondents aged 20 to 39 reported >1 partner in the last year compared to <10% of respondents outside this age range.

Once the data are adjusted to make them more comparable we have 12 different distributions to analyze. The NHSLS has 3 different measures of partners in the last year, so contributes a total of 6 distributions (3 measures by 2 sexes). The remaining surveys have 1 measure. GSS and BRFS each provide 2 distributions, 1 for each sex, and the NSM and NSW each provide 1 distribution for males and females respectively.

##### Social Process Models for Partnership Formation

We examine 4 general types of mechanisms that may describe the process of partnership formation: fixed rate, a search process with a stopping rule, preferential attachment, and a two-stage vetting process. In each case, there is >1 model that can be defined for that mechanism, so in all, we examine 11 different models: the Poisson, Yule, Waring, Discrete Pareto, Negative Binomial, Geometric, Discrete Pareto Exponential, Geometric Yule, Negative Binomial Yule, Anchored Negative Binomial and the Poisson-lognormal. These models are drawn largely from^{5} but also include the Poisson-lognormal model described by.^{30}

The Poisson is a fixed rate model and is the natural null hypothesis in this context. All persons are assumed to acquire partners at a fixed homogeneous rate (λ) over time. The assumption that everyone in the population has the same propensity to form ties is both strong and unrealistic. We do not expect this model to fit the data, but it is the most appropriate basis for comparison.

The Poisson lognormal and Negative Binomial models are simple generalizations of the Poisson, retaining the assumption that people have a fixed rate of partner acquisition, but relaxing the homogeneity assumption. Individual propensities to form ties (λ_{i}) are drawn from a lognormal distribution (for the Poisson-lognormal) or a γ distribution (for the Negative Binomial). The Negative Binomial is sometimes referred to as a Poisson-γ mixture model and is often used in the biologic sciences. Both models are used to capture overdispersion relative to the Poisson.

The Negative Binomial and Geometric can also be interpreted as search process (or waiting time) models with stopping rules. For both, the process is represented as a sequential search with parameters (*n*, *r*, *p*): over n trials, partners are acquired with probability *p* until the search is stopped when *r* suitable partners are found. For the geometric model, *r* = 1, and the search stops after the first suitable partner is found. For the Negative Binomial model the search stops after the specified number (*r*) of suitable partners are found.

The Yule^{5,23} and the Waring^{24} are degree distributions that result from preferential attachment models.^{25} Under these models the degree distributions are generated as the accumulation of partners over time where the probability that a contact is made with any particular individual is a function of that individual’s current degree. These preferential attachment models are often referred to as “the rich get richer” models. The Yule model also has an alternative interpretation, as an exponential mixture of geometric distributions. The Waring is a generalization of the Yule that includes an additional parameter for the probability of recruiting an individual with no partners into the network. We include the Discrete Pareto here for comparisons with previous literature. The Discrete Pareto is the power law model with *P*(*K* = *k*) ∝ *k*^{−ρ}, often found in the physics literature. It appears to lack a plausible stochastic mechanism for partnership formation, so is perhaps best understood as an approximation to the Yule. The “power law” specifications used in the prior literature are typically defined over the positive integers. They therefore exclude zero, although the latter is a valid response, which is often the modal value in the data.

The Negative Binomial-Yule, Geometric-Yule and Discrete Pareto Exponential are vetting models that represent partnership formation as a two-stage process. First, a person generates an acquaintance list that serves as the eligible partner pool. People then choose their sexual partners from the acquaintance list. This class of model is extremely flexible in that practically any probability distribution can be specified for each stage of the process. The vetting process may be used to represent a latent clustering of potential partners due to social networking, geographic, temporal, or other factors. The mechanism can also be used to represent a selection process designed to satisfy multiple criteria. Individuals may choose partners with a fixed probability from acquaintances that independently satisfy some criteria, for example, sex, age, marital status, and sexual preference. The accumulation process continues until they meet a quota of people that satisfy the criteria. The Yule-vetting models can also be interpreted as generalizations of the Yule distribution that recognize that the formation of sexual partnerships is not costless.

All of these models can be generalized to allow for the possibility that the process may provide poor predictions for the lower degrees, but still accurately predict the tail behavior. To isolate the upper regions of the degree distribution we allow separate parameters to fit the probabilities of lower degrees (below a given cutoff value) and fit the parametric model only to those values at or above this cutoff. This is particularly useful for comparisons to the power-law models, since they do not make predictions for *k* = 0.

If the cutoff is above zero the Geometric, Negative Binomial, and Poisson models start their search process after the cutoff partner. As an alternative to this, the last model, the Anchored Negative Binomial fits the values below the cutoff with additional parameters but assumes the search process started with the first partner regardless of the cutoff above one. For a more in-depth description of all of these models, see Refs. ^{5} and ^{30}.

##### Model Estimation

We use maximum likelihood to estimate the parameters of these models from the 5 different survey data sets. Maximum likelihood estimation (MLE) has many desirable statistical properties in this context.^{31} The likelihood functions are computed from the probability mass functions of the models. If the observations were categorical or top-coded, then the likelihood is modified to reflect this. For example, if the degree of an individual is only know to be between 5 and 10, then the likelihood of observing this is the sum of the probabilities of observing a degree of exactly 5, a degree of exactly 6, …, and a degree of exactly 10. In this way, the probabilities of categorical or top-coded observations can be derived from the probability mass functions of each degree. The resulting likelihood function is referred to as a grouped likelihood as the individual’s probabilities are grouped together.^{22}

Since the power-law models do not provide predictions for *k* = 0, model comparisons can only be based on the fits for degrees above zero. We use 3 cutoff values in our tests: 1, 2, and 3. This produces a total of 33 = 11 × 3 models per distribution. The number of parameters for a model, *d*, is the number of parameters when the cutoff is zero plus the cutoff value. When using a MLE the natural comparison between 2 nested models is a likelihood ratio test. For nonnested models, as most of our comparisons are, there are several alternative methods for comparing goodness of fit. We use 2: (1) the Akaike Information Criterion (AIC)^{32,33} and (2) the Bayesian Information Criterion (BIC).^{34} For a simple random sample of *n* people with data *K*_{1}, … *K*_{n}, the AIC is defined as AIC = −2log likelihood[*K*_{1} = *k*_{n}, … *K*_{n} = *k*_{n}] + 2*d*, and BIC = −2log likelihood[*K*_{1} = *k*_{n}, … *K*_{n} = *k*_{n}] + *d* log *n*. The 2 approaches are similar but the BIC has the benefit of incorporating model uncertainty and sample size into the decision. The AIC has the advantage of efficiency.^{33} If the complexity of the true model does not increase with the size of the data set, the BIC is usually preferred, otherwise AIC is preferred. For both, a well-fitting model will be indicated by a low value. When comparing models using BIC, the general rule is that a difference of 2 to 6 is moderate evidence of superior fit, 6 to 10 is strong evidence, and 10+ is very strong.

#### Results

We start by employing the simple curve-fitting approach used in the previous literature to estimate the scaling parameter for a power-law model. This is a helpful benchmark estimate, establishing what a naïve analysis of the degree distribution from the 5 surveys would produce. For samples with degree >1 the estimates of the scaling parameters are below 3 in 10 of the 12 fits, implying the distributions are scale free. In the other 2 cases, the confidence interval generated by the standard error incorporates values that indicate a scale-free distribution. When we restrict the sample to degree >2, all but 1 of the scaling parameter estimates are below three. This approach consistently produces estimates that predict a generalized HIV/AIDS epidemic in the United States, as well as generalized epidemics of all other STI.

##### Comparisons Based on Goodness of Fit

Using MLE to fit the 11 social process models to these data, no one model emerges as the best fit across the different surveys. Table 3 shows the top 3 model fits to each data set as ranked by BIC. The top-ranked model reports the associated BIC; the second and third best fits show the sequential differences in the BIC. Scanning across the different samples in Table 3, the variation in model ranking is clear. Four of the 11 proposed models rank as the best fit in at least 1 instance across the 12 distributions; 5 of these are variants of the power law models, 5 are a search/stop rule model (the Negative Binomial), and 2 are a vetting model (the Negative Binomial Yule). Only 6 of the 11 models ever show up the top 3 ranking. Of these, the majority are power law (19 of the 36); the remainder nearly evenly split between vetting (10) and search/stop rules (7). One model that never shows up in the top 3 ranking for any data set is the null—the Poisson model of a single homogeneous rate of partnership formation over time.

The difference in BIC across the top 3 ranked models for any survey is typically large enough to indicate strong or very strong evidence of superior fit at each rank. An example of the fits of the different models to the observed data are illustrated in Figure 1, which shows the predicted distributions under each model plotted over the observed data from the NSM and NSW. The fit of several different models to the same data is almost indistinguishable. This suggests that the information in the distributions can not discriminate among these models.

The variations in fit display some systematic patterns by sex. For the male degree distributions, the top-ranked models are mostly search/stop rule and vetting models, though power-law models account for half of the models in the top three ranking (9 out of 18). For the females, by contrast, the power-law models are slightly more likely to provide the best fit: 4 of the top 6, and 10 of the 18 top three.

There are also systematic differences in model fits by survey. The GSS and 2 of the 3 distributions from the NHSLS (categorical and constructed) produce identical model rankings for men and nearly identical rankings for women. The ranking is somewhat different for men and women but both have the Discrete Pareto and Waring power law models in the top 3. The similarity in the fits is not altogether surprising—the two NHSLS distributions come from different measures in the same survey and both the GSS and NHSLS use a true nationally representative sampling frame drawn by the same organization, NORC. What is surprising is that the continuous measure from the NHSLS is best fit by a different set of models. This indicates that the model fits are not robust to changes in question design, even within the same survey. The BRFS and NSM have identical model rankings for men, though not for women, and again share a number of models in the top 3.

How robust are these findings to fits based on the upper tail of the survey distributions? The AIC, BIC and log-likelihood for the 3 best-fit models to all 12 distributions at the three different cutoffs are shown in Appendix 1. Here again, only a few models find their way into the top rank. Five of the 11 proposed models rank as best at least once among the 24 distributions defined by the 2 upper-tail cutoffs: Half are power-law models; most of the rest are search/stop rule models. Nine of the 11 models eventually show up in the top 3: Half are power-law models, and the other half are split between search/stop rule and vetting, with a few Poisson-lognormal (PLN) mixtures.

There is little consistency in the specific model rankings within a survey across the cutoff values, but there is some consistency in the types of models that fit across cutoffs. If a power law model fit the distribution best for the full distribution, a power law also tended to fit the distribution best at cutoffs 2 and 3. And full distributions best fit by search/stop rule models were also fit best by these at cutoffs 2 and 3. The NSM was an exception; for this survey, there was almost no consistency at any level.

##### Validation with Epidemic Potential Predictions

The goodness-of-fit tests do not identify any specific model, or even a class of model, as the best all-around fit to the data. A comparison of the epidemic potential predicted by each model offers a different perspective. The predicted threshold transmissibility τ_{c} and the 95% confidence interval around the threshold, gives us a direct estimate of the probability of an epidemic for any given level of integrated transmissibility τ̄. We know that there is no generalized HIV/AIDS epidemic in the United States,^{35} while the prevalence of other STI varies substantially: from under 1% for gonorrhea,^{36} and about 4% for chlamydia,^{36} to over 20% for human papilloma virus (HPV)^{37} and herpes simplex virus-2 (HSV-2).^{38} This allows us to evaluate our estimates of τ_{c} in 2 ways. First, if the predicted transmissibility threshold is zero such that any τ̄ is likely to produce a generalized epidemic, we can conclude that the predictive model fails. Second, if the predicted transmissibility threshold is greater than zero, the potential for a generalized epidemic depends on τ̄. The value of τ̄ varies by pathogen, so in principle, if the estimated τ_{c} lies below τ̄ for that pathogen, we would expect generalized spread of that pathogen in the United States, and if it lies above τ̄, we would not. Our estimates of τ_{c} can be evaluated accordingly.

There are a variety of transmissibility estimates in the literature for different pathogens, although there is considerable uncertainty about most of these, and considerable variability in what they represent (e.g., per coital act vs. per partnership and varying time periods for the partnership estimates). For reference, we show a generally accepted estimate of τ̄_{HIV} here that is drawn from a study of discordant couples in Uganda, 0.162.^{28} For a number of reasons reviewed in the discussion, this is likely to be an overestimate for the United States, but it is not out of line with older US estimates of τ̄_{HIV} based on transmission from transfusion recipients to their partners.^{39} The analysis and presentation of the results allow the reader to make comparisons to alternative values.

The results can be seen in Figure 2. For the power law models (top panel) the predicted epidemic potential of these networks is infinite: the transmissibility thresholds are all zero and the 95% confidence intervals do not include any nonzero values. As with the naïve estimates based on linear regression, the more accurate estimates from these models predict generalized epidemics of all STIs in the United States. The predictions from the power law models simply do not fit the evidence.

Notes: The black line indicates no epidemic threshold exists; the grey line indicates the estimated network epidemic potential required to produce an epidemic of HIV, based on a published estimate of from Uganda.^{29} The grey band is defined by HIV transmissibility estimates from the United States.^{39} NHSLS-Cont is data from the continuous response question; -Cat is data from the categorical response question; -Cons is data from the constructed variable.

For the best fit nonpower-law models (bottom panel), the predicted epidemic potentials are much lower, and the predicted transmissibility thresholds correspondingly higher. All of the threshold transmissibility estimates are above 0.2 and 3 of the 6 are above 0.5. None of the 95% confidence intervals includes zero. As a result, these models predict that an epidemic threshold does exist, and the value of *R*_{0} will depend on the transmissibility τ̄ of the STI. The implications for any specific STI can be read off of the graph directly: for *R*_{0} to be at least 1, the integrated transmissibility τ̄ for that STI would have to be larger than the predicted threshold τ_{c} shown on the vertical axis. For reference, we plot an estimate of τ̄_{HIV} drawn from a study of discordant couples in Uganda, 0.162,^{28} in the grey line. All of the predicted thresholds under the best fitting nonpower-law models lie above this line, and only one of the confidence intervals includes it. Based on this estimate of τ̄_{HIV}, none of these models would predict a generalized epidemic of HIV in the United States. Alternative lines can be drawn for other values of transmissibility.

#### Discussion

The results of this study are threefold. First, when tested against alternatives, power law models do not enjoy consistent support from the data. Using goodness of fit to the degree distributions as the criteria, several plausible social process models, representing either a search and stop rule mechanism or a 2-stage vetting model, are as likely to be the best fitting model or ranked in the top 3. These 2 model classes are flexible and have several very natural and intuitive interpretations. The search and stop rule models, for example, can be thought of as selecting new partners until some set of criteria are satisfied, for example, dating until a spouse is found or dating until enough experience in the social world has been acquired. The 2-stage vetting models can be interpreted as meeting some number of people and selecting a partner from those acquaintances or selecting a partner from the people who live in the same area. The best overall fitting social process model was the Negative Binomial, which has can be interpreted in 1 of 2 ways: Either people use a search and stop rule, or they acquire partners at a fixed, but heterogeneous, rate.

Second, using the prediction of the epidemic threshold in the United States as the criteria, all of the power law models fail completely: All predict a generalized epidemic of HIV—and of every other sexually transmitted infection—based on every data set examined here. It is worth emphasizing that the estimates of the power law models generate this prediction regardless of the level of pathogen transmissibility: Any pathogen with a nonzero probability of transmission is expected to cause an epidemic.

All of the best-fitting social process models, by contrast, predict epidemics only if transmissibility is sufficiently high, on the order of 0.3 to 0.6, with 1 outlier at 0.8. The implication is that the epidemic potential in this network is probably too low to sustain a generalized epidemic for STI with average integrated transmissibility per partnership below 20% to 30%. HIV is thought to have a transmissibility in this range, so the social process model estimates are consistent with the observed low prevalence of HIV in the United States. By contrast, the HPV is estimated to have a per partnership transmissibility of about 60%,^{40} and HPV prevalence in the United States is about 25%. So here too, the social process model estimates are consistent with the data.

Many (if not all) of the curable STI (e.g., chlamydia, gonorrhea, syphilis, trichomoniasis) are estimated to have similarly high transmissibilities—50% to 70% per partnership.^{41–45} As the social process models would predict, all these STIs are also endemic in the United States, though at relatively low prevalences, ranging from 0.05% for syphilis^{46} to 4% for chlamydia.^{36,47} The low observed prevalence, given the very high transmissibility, may seem inconsistent with our estimate of the epidemic potential, but lower prevalence would be expected given the shorter duration of infection. For these STI, it may also be more appropriate to base the estimate of τ_{c} on partners in the last month rather than the last year. This would lead to substantially lower estimates of epidemic potential and be more consistent with the observed prevalence.

In the main, the social process model estimates seem to be consistent with what we know about STI transmissibilities and prevalence in the United States. The only exception we can find is HSV-2. HSV-2 is thought to have a low transmissibility similar to HIV,^{48} yet the prevalence of HSV-2 in the United States is over 20%.^{38} So, that is a bit of a puzzle. It is not simply a problem with the social process models, however, as this is a pattern that is not consistent with the basic transmission dynamic modeling framework.

The complete failure of the scale-free models to make reasonable epidemic predictions is striking, particularly in light of the very strong public policy relevance claimed for these models. For example,^{7} write that their “… most important finding is the scale-free nature of the connectivity of an objectively defined nonprofessional social network.” The public health implications, they explain, are that:

“… the measures adopted to contain or stop the propagation of diseases in a network need to be radically different for scale-free networks. Single-scale networks are not susceptible to attack at even the most connected nodes, whereas scale-free networks are resilient to random failure but are highly susceptible to destruction of the most connected nodes.”

It is claims like these on the deep relevance of the previous scale free estimates for STD prevention that we criticize here. Other examples of such claims are given in Ref. ^{6}.

The third, and perhaps more important finding, is that none of the models tested here consistently provide the best fit to the data. Model rankings varied by survey, by measure within survey, by sex, and by the upper-tail cutoff used to restrict the data used to fit the models. One interpretation of this finding is that none of these models does a good job capturing the underlying mechanism that generates sexual partnership networks. The large confidence intervals around the predicted thresholds, even by the best fit models, illustrate just how little leverage degree distributions alone provide in establishing the likelihood of an epidemic.

A good empirical example of the predictive limitations of the degree-based model is the pervasive and long standing racial disparity in STI prevalences. Among non-Hispanic blacks in the United States, the relative risk of infection ranges from about 6 for chlamydia^{36} to over 20 for gonorrhea^{36} and HIV.^{35} These disparities are not explained by group differences in rates of partnership acquisition^{49} so would not be produced by any of the simple degree-based heterogeneity models considered here. One of the most important features of STI prevalence in the United States is thus beyond the explanatory capacity of these models.

To inform and guide public health, network-based approaches to epidemic modeling must take in to account the key structural features of networks that influence transmission, and the behaviors that generate these structures. This kind of research is being done, albeit with less publicity. An example is the work on “assortative mixing,” the tendency for people to choose partners like themselves in terms of age,^{50} race/ethnicity,^{51,52} sexual role,^{53} and possibly activity level.^{54} Assortative mixing can lead to patchy clusters in the network with little connection between them, and it is possible to obtain analytic solutions for the effect of such mixing on epidemic thresholds.^{55,56} Another important form of heterogeneity is in the timing and sequence of partnerships. This includes the impact of variations in monogamous partnership duration^{57} and the prevalence of concurrent partnerships.^{58} Lon-term serial monogamy can dramatically reduce epidemic potential, but even small amounts of concurrency can reestablish connectivity and substantially increase the potential for spread. A good review of these patterns and their impacts on transmission can be found in Ref. ^{59}. The implication of this body of work is not that the simpler models should be abandoned, just that they should be properly evaluated and their limitations properly communicated.

The public health implications of our results are clear: The radical implications of the “scale-free” models are theoretically intriguing, but in all likelihood empirically irrelevant for generalized HIV epidemics. Broad population-based prevention strategies such as education programs risk behavior reduction efforts, condom distribution, and antiretroviral therapies may therefore be effective in reducing the spread of HIV and other incurable STI and should not be abandoned in favor of exclusive targeting of high-degree hubs.