Pequegnat, Willo PhD; Fishbein, Martin PhD; Celentano, David ScD; Ehrhardt, Anke PhD; Garnett, Geoffrey PhD; Holtgrave, David PhD; Jaccard, James PhD; Schachter, Julius MD; Zenilman, John MD
SELF‐REPORTED BEHAVIORS have often been the primary outcome measures in behavioral‐prevention studies designed to reduce the further transmission of HIV and AIDS. Because it is often assumed that study participants may not always provide truthful or accurate responses to sensitive questions about their sexual or drug‐use behaviors, it has been argued that behavioral self‐reports are insufficient for the evaluation of the success of an intervention. Thus, the claim has been made that results of these studies can only be accepted if there is a biologic outcome measure that can serve to corroborate self‐reported behavior change. Moreover, it is asserted that biologic outcome data provide a meaningful public health outcome, and must be assessed before prevention programs can be widely adopted.
Although most would agree that HIV seroincidence would be the strongest evidence used to assess the public health impact of HIV‐prevention studies, this assessment is often not possible if there is a low incidence of HIV in the population being studied. Thus, many have recommended that sexually transmitted diseases (STDs) should be used as surrogate markers for HIV, despite the fact that the true empirical relationship between a given STD and HIV has not been established. Indeed, the relationship between the incidence and prevalence of a given STD and HIV incidence is no less complex than the relationship between self‐reported behavior and HIV incidence.
Increasing correct and consistent condom use or decreasing the prevalence of an STD could, under the right circumstances, decrease the transmissibility of HIV. The magnitude of such a decrease will depend on several factors, including the prevalence of HIV and the sexual mixing patterns (i.e., partner selection) in the population. Given the complexity of these relationships, mathematical models of HIV transmission have been developed to help clarify the role of behavior change and STD‐control programs in the prevention of HIV transmission.
The National Institute of Mental Health and the Annenberg Public Policy Center convened the Workgroup on Behavioral and Biologic Outcomes in HIV/STD Prevention Studies to consider some of these issues and to provide guidance to the research community. The workgroup was charged to address questions about the validity of self‐reported behaviors, the sensitivity and specificity of STD diagnostic tests, and the relationship among behavioral measures, STDs, and HIV. The workgroup was also asked to consider the utility of HIV transmission models, and the possible similarities and differences among these models. Finally, members of the workgroup were asked to outline some future directions for research that would sharpen the understanding of these issues. These questions are addressed systematically in this position statement.
How Valid Are Self‐Report Data on Condom Use, Number of Partners, and Other Behavioral Measures of HIV Risk, and How Can We Maximize the Validity of These Measures?
In using self‐reports of behavior, two psychometric aspects of the measurement instrument are of greatest concern: (1) reliability (is the instrument free of random error?); and (2) validity (is the instrument measuring what we think it is measuring; i.e., is the instrument free of both random and systematic error?). Self‐reports of the frequency of sex, condom use, and the number of sexual partners typically involve characterizations by an individual of behavior during a given period. A self‐report can be incorrect either because the person does not accurately recall his or her behavior during that period, or because the person fails to truthfully report behavior that is accurately recalled.
In general, people will provide truthful responses if they are (1) assured that their responses will be anonymous or held confidential, such that their names will ultimately not be associated with their responses; (2) provided with a set of motivating instructions that stress the importance of honest responding, underscore the importance of honest responding for the scientific integrity of the research project, and emphasize the scientific importance of the project in general; (3) not asked to report their behavior in a face‐to‐face context, but rather indicate their responses in a way that others will not directly observe their answers in the immediate situation; and (4) asked to sign a pledge of truthfulness before undertaking the interview or questionnaire. Given these conditions, most persons will provide what they believe are truthful responses, although some levels of untruthfulness may still occur in certain populations for certain socially taboo or stigmatized behaviors. The workgroup recommended that the above practices be adopted where possible, and that a valid measure of social desirability response tendency be routinely included in behavioral assessments to provide an indication of potential failures to be truthful, and which can be included as a covariate in statistical analyses, where appropriate.
Given truthful responding, the accuracy of recall will be affected by a number of factors, including the length of the period for which recall is requested, the question format, and individual difference variables (e.g., reports might be less accurate for persons who engage in behaviors with a high degree of frequency). It is commonly assumed that persons will have more accurate recall for shorter durations; however, recent research suggests that this may not always be the case, and that moderate durations (3‐6 months) may be preferable to shorter (1 month) or longer (1 year) durations. Another common assumption is that persons provide frequency judgments by thinking about individual episodes of the behavior during the time frame in question, and then mentally tally the behavior to report an overall frequency. Although some persons may do so, others invoke rule‐based heuristics to generate frequencies (e.g., “I typically engage in sex twice a week, so the number of times I have engaged in sex in the past three months is 2 × 12 = 24”). Certain question formats encourage episodic versus rule‐based thinking, and the use of different cognitive strategies in making frequency judgments can affect recall accuracy.
Accuracy of recall may also be affected by the person's educational level or age, the appropriateness of the methods of assessment for the question being asked (e.g., variability of individual behavior is best captured by a diary), demand characteristics of the situation, and by the use of alcohol or drugs either chronically or contemporaneously with the sexual activity. Although each of these variables can impact the accuracy of recall, the weight of the scientific evidence to date suggests that properly administered self‐reports of sexual frequency, condom‐use frequency, condom‐use consistency, and the number of sexual partners during a moderate time frame can yield reasonably accurate indices of these behaviors, which can be used for a wide range of scientific studies. However, research suggests that a small percentage of persons may still provide highly inaccurate responses. Residual analyses and other statistical procedures can be used to help identify these persons.
It is important to note that self‐reports of behavior can be useful outcome measures even if they are subject to some inaccuracy. Accuracy of self‐reports can be characterized at either the aggregate (group) level or the individual level. At the aggregate level, the concern is whether the mean or median number of reported sexual‐risk activities for a group of individuals corresponds to the true mean or median of sexual risk activities for that group. At the individual level, the interest is whether the self‐report of sexual‐risk activities for a given person corresponds to the actual sexual‐risk activities of that person. A self‐report measure can be inaccurate at the individual level, yet still provide useful data at the aggregate level (e.g., if some individuals overestimate their frequency of sex and others underestimate this frequency, it is possible for the overestimations to cancel the underestimations, thereby yielding an accurate representation of the mean or median).
Inaccuracy in self‐reports may not pose a problem for scientific questions in which a degree of inaccuracy can be tolerated. For example, if the interest is to compare mean shifts in condom use for persons randomly assigned to an experimental or control group, and if all self‐reports tend to underestimate actual behavior by 10% to 15%, this bias will be present in both groups, and it will not affect the estimate of the mean difference in true condom use between groups. In this case, problems of partial inaccuracy would be problematic if the nature of the bias differs in the experimental and control groups. A common focus of research is to identify social and psychological correlates of behavioral tendencies to engage in unprotected sex. In this case, the self‐report measures need to reasonably map onto the true score and be sufficiently free of measurement error, so that correlations with other constructs are not meaningfully biased.
It is possible to use self‐reports to accurately measure behavior change; however, there are many reasons why biologic data may not be congruent with self‐reports. Thus, failure to find a simple linear relationship between condom use and incident STDs cannot be taken as an indication of “lying” or “inaccuracy” on the part of the respondent; indeed, epidemiologic models of transmission would suggest a true nonlinear relationship. In addition, although respondents may truthfully (and accurately) report 100% condom use, they may have been using condoms incorrectly (e.g., not put on at initiation of sex, put on the wrong way and then flipped over, taken off after ejaculation but sexual activity continued), or the condom may have slipped off, leaked, or broken. Incorrect condom use with high‐risk partners could result in incident STDs. Clearly, attempts to link behavior to STD incidence cannot rely solely on a categorical “yes” or “no” response. Behavioral scientists need to develop methods to assess correct as well as consistent condom use. Despite this shortcoming, there was consensus among the members of the workgroup that when appropriate assessment conditions are established, well‐designed questions concerning sexual‐risk behaviors result in reasonably reliable and valid self‐reports.
How Good Is the Sensitivity and Specificity of Laboratory Tests of HIV and STDs, and What is Their Predictive Value When Used in Field Trials?
Sensitivity refers to the ability of a diagnostic test to identify the true positives, or persons that are actually infected, whereas specificity refers to the ability of the test to identify the true negatives, or persons that are not infected. The organisms most commonly used to evaluate the effectiveness of a behavior‐change intervention are the curable bacterial infections Neisseria gonorrhoeae and Chlamydia trachomatis. These organisms offer two advantages: they are highly prevalent in many settings, and have sufficiently high incidence to make outcome measurements more practical. Because the infections are easily curable with a single‐dose oral antibiotic regimen, determination of incidence is facilitated by a strategy of screening, treatment, and re‐screening at desired intervals.
The following STDs have also been used in cumulative incidence or prevalence studies and to verify virginity in younger populations:
Trichomonas vaginalis. This organism is common in some populations. On a practical level, infection detection is limited to women, and asymptomatic infection is common. Culture and modification of some polymerase chain reaction (PCR) tests by adding primers to other commercially available tests have been used to enhance diagnosis.
Bacterial vaginosis. This is an ecological alteration of the vaginal flora and not a true sexually transmitted infection. Because of high prevalence and high incidence of posttreatment recurrence, this STD may be a better marker of sexual activity than of incident infection.
Syphilis. Syphilis serology may be useful in some study groups, but as with the viral serologies, it requires repeated collection of a blood specimen with confirmatory testing. In most areas‐even in those with high STD morbidity‐the incidence of syphilis is too low to be useful as an outcome measure in behavioral intervention studies.
Chronic viral infections can also be used as outcome measures, but require the identification of incident infections. Human immunodeficiency virus and many other viral STDs (e.g., herpes simplex virus, human papillomavirus) result in lifelong infection, which makes the differentiation of incidence and prevalence difficult. Incidence is typically assessed by testing paired sera to identify seroconversion, which makes assessment logistically more demanding and expensive.
Herpes simplex virus type 2 (HSV‐2). This infection can be a good outcome measure because of its high prevalence and incidence in selected populations and age groups. The highest HSV‐2 seroconversion rates occur in adolescents.
Hepatitis B. Sexual transmission is confounded by transmission from blood shared through injecting drug use. Incidence is decreasing as vaccination becomes more widely used.
Hepatitis C. Transmission of this disease is more likely due to injection practices than to sexual transmission.
Human papillomavirus (HPV). The incidence of HPV is extremely high in adolescents, and direct detection may be a marker for sexual activity in this group. However, HPV serology needs further development and evaluation before it can be reliably used as an outcome measure.
Recent advances in molecular diagnostics assure a more accurate diagnosis of chlamydia and gonococcal infection; furthermore, the technical protocols for testing and specimen collection have changed rapidly. Previous testing methods required collection of direct genital specimens by highly trained clinicians. Most specimens can now be collected using noninvasive methods such as first‐catch urine (1 to 2 ounces urine collected 2 hours after last urination) or self‐administered vaginal swabs. These methods allow convenient screening of asymptomatic populations in remote (nonclinical) settings, and make obtaining consent from larger proportions of research subjects easier.
The current generation of tests based on nucleic acid amplification are highly specific (> 99%), although other technologies may soon offer similar performance. The sensitivity of a single test is approximately 85% to 90% for C trachomatis, and 90% to 95% for the N gonorrhoeae. The current HIV tests used in clinical settings (i.e., enzyme‐linked immunosorbent assay and Western blot analysis) and in research settings (PCR) also have excellent sensitivity and specificity.
Because there are excellent commercially available tests and to enhance the ability of other researchers to replicate study results, the use of nonstandardized “home brew” tests is discouraged. Furthermore, treatment decisions are more easily made from the results of a licensed test. Currently available chlamydia tests include LCx (Abbott, Abbott Park, IL), PCR (Roche, Nutley, NJ); and TMA (Genprobe, San Diego, CA). Abbott manufactures an FDA‐approved gonococcus test. The Abbott LCx chlamidia test is FDA approved for both urine and genital swabs, and the Roche PCR is FDA approved for swabs. The Roche PCR for urine and the Genprobe TMA for swabs are pending FDA approval.
Despite all precautions, some false‐positive and false‐negative results will occur. Recent evaluations suggest that the specificity of these tests approaches the limit of our ability to measure specificity. The limitations imposed by < 100% sensitivity are easier to estimate, but must be taken into consideration in study design. For example, if sensitivity of the chlamydia test is 85% to 90%, and sensitivity of the gonococcal test is 90% to 95%; after one round of testing, 5% to 15% of the initial prevalence will be residual, albeit undetected. In addition, the 2% to 4% treatment failures for infections that were detected and treated need to be considered; infections identified after the intervention would be misclassified as incident cases. Study designs (e.g., comparable controls, randomization) must therefore control for spurious findings to prevent misinterpretations.
The specificity and sensitivity results reported here are based on tests conducted in specialized laboratories. To achieve comparable results in field studies, it is necessary to exercise stringent quality control in the collection, storage, and transportation of the specimens to the laboratory. Behavioral interventions that integrate biologic outcomes into the research design require STD expertise during all phases of the study and the use of a reference laboratory. Specimens must be stored under certain conditions and must be shipped carefully. If the study is conducted in an international setting, an appropriate in‐country laboratory must be located or reliable transportation procedures established to ensure the quality of the specimens at the time the diagnostic tests are performed.
Additional concerns regarding the design of a study with HIV/STD endpoints involve the prevalence and incidence of STDs or HIV in the target population. If prevalence or incidence is low, a large sample size may be required, which will lead to expensive laboratory testing. There are algorithms based on pooling of specimens before testing that can be used to reduce the costs of diagnosis.
There was consensus among the members of the workgroup that the specificity and sensitivity of all HIV diagnostic tests and of most STD diagnostic tests are excellent. Although large field studies pose a challenge in specimen collection, the obstacles are not insurmountable if experienced researchers and laboratories are used. Given reliable and valid self‐reports of behavior and sensitive and specific diagnostic tests for STDs, the question of the relationships between these measures must be addressed.
What Are the Relationships Among HIV and Other STDs?
The objective of an HIV‐prevention intervention is to reduce HIV incidence. Incident HIV infection is facilitated by the presence of other STDs. A consistent body of empirical data has found that genital ulcer disease (e.g., syphilis, chancroid, herpes) and exudative diseases (e.g., gonorrhea, chlamydia) increase the risk of incident HIV infection by twofold to sixfold. However, the results of large‐scale prospective intervention studies oriented toward reducing HIV infection through STD control are conflicting, which suggests that the biologic and epidemiologic relationship is extremely complex. The impact of STD control on HIV incidence may be affected by the prevalence of STDs or HIV and by the behavioral characteristics of the population (e.g., patterns of condom use).
From a population perspective, STDs play an important role in HIV incidence in settings where heterosexual transmission occurs in a population with high STD incidence. Sexually transmitted diseases play a minor role in the facilitation of HIV infection among homosexual men. Although STDs play a relatively unimportant role in HIV transmission among injecting drug users (IDUs), they may play an important role in the transmission of HIV to sexual partners of IDUs. Thus, although a reduction in STD incidence or prevalence can reduce HIV transmission, a simple relationship between STD and HIV does not exist under most conditions. Although this fact questions the utility of using STD data as a surrogate for HIV data, it does not refute the utility of using STD data to demonstrate that sexual transmission of a pathogen has been interrupted by an intervention.
What Are the Relationships Between Condom Use and STDs (including HIV)?
When used correctly and consistently, condoms can reduce the incidence of STDs, including HIV. Convincing data to this effect are from studies of HIV‐discordant couples that indicate a substantial reduction in HIV‐transmission rates among consistent condom users. However, as indicated previously, reports of consistent condom use do not always imply correct use. Because there are multiple opportunities for errors in condom use, study methodologies should include questions that assess the operational and technical aspects of condom use.
Just as the relationship between STD and HIV is extremely complex, so too is the relationship between behavior and STDs (including HIV). The transmission of a sexually transmitted infection is dependent on individual susceptibility, number of sexual partners, number of exposures, transmissibility of the infection, and the infection status of the partner. For example, unlimited unprotected sex with an uninfected partner will not result in the transmission of an infection.
To further complicate matters, there is increasing evidence that people tend to engage in risky behaviors with partners they perceive as “safe,” and in safer behavior with partners they perceive as “risky.” For example, persons often use condoms with casual and new partners but not with their steady or regular partners. To the extent that these perceptions are accurate, small changes in condom use with partners perceived as risky can disproportionately alter the risk of STD infection.
Therefore, incident STDs should not be considered the gold‐standard for “validating” self‐reports of condom use. Even in populations with high STD incidence (i.e., 15% to 30% per year), the majority of persons who do not use condoms correctly and consistently do not acquire an STD. If STD acquisition is used as the only measure of incorrect or inconsistent condom use, large‐scale misclassification of risky individuals as “safe” will occur. Although most investigators recognize that failure to use a condom correctly and consistently does not always lead to the acquisition of an STD, some have argued that the presence of an incident STD is a clear indicator that there was incorrect or inconsistent condom use. Unfortunately, the combination of using diagnostic tests with < 100% sensitivity (i.e., failure to diagnose preexisting “prevalent” infection) and the occurrence of treatment failures can result is what appear to be “new” STDs. Thus, even persons who abstain from sex or who use condoms correctly and consistently (e.g., during the past 2 months) may appear to have a new or incident infection.
How Should Behavioral and Biologic Outcome Measures be Used in HIV and STD Prevention Studies?
The effectiveness of an intervention to reduce HIV transmission should, whenever possible, be based on both behavioral and biologic outcomes. Although it is always appropriate to obtain behavioral measures, this may not be true for biologic measures. For example, STD outcome measures may be inappropriate criteria for the development and initial testing of behavior‐change interventions (e.g., Phase I and II studies) or when interventions are designed for prevention in populations with low STD prevalence (e.g., school‐based prevention programs). Requiring biologic markers in these cases will increase the cost and complexity of these studies without yielding substantive additional information.
Are Mathematical Models Useful In the Assessment of the Impact of Behavior Change or STD‐Control Strategies on STD and HIV Transmission?
Given the complexity of the relationships among behaviors, STDs, HIV, and the questionable utility of obtaining biologic outcome measures in some settings, it has been suggested that mathematical models be used to provide estimates of the impact of behavior change (or changes in STD incidence) on HIV.
Models of HIV‐transmission dynamics can be used to direct the focus of interventions and to make predictions about their public health impact. Although the relationships among behaviors, STDs, and HIV are complex, a simple formulation that captures much of the interaction is possible. The basic reproductive number (Ro) is a measure of the potential for growth of an epidemic at the outset when there are few infected persons. This measure is defined as the average number of infections caused by one infected individual in an entirely susceptible population. The main constituents of this basic reproductive number are the transmission probability per sex partnership (β), the rate of sex‐partner change (c), and the mean duration of infectiousness (D), such that: Ro = βCD. According to this model, a percentage change in any of these three values will have the same impact on the scale of the epidemic. (The speed of the epidemic depends on the time during which the new infections in Ro are generated). As the infection spreads, the population is no longer entirely susceptible, and the average number of new infections caused by one newly infectious person decreases. Ro is a measure of the potential for spread within a population, and reductions in Ro will reduce the population‐level scale of an epidemic. The risk of infection for a susceptible person depends upon the force of infection, which is a function of c, β, and the prevalence of infection in partners. Thus, if there is infection within the partner pool, reducing the number of partners and the transmission probability should reduce the risk of infection.
The complexity of the epidemiology has to be considered when extrapolating estimates of impact from behavior change or reductions in STD incidence. Sex partnerships have many different characteristics, including a range of numbers and types of sex acts. Logically, there must be a transmission probability per sex act, and the probability of transmission per partnership will increase with the number of sex acts as a binomial process (with a low transmission probability per sex act and a high number of acts, this is approximately a Poisson process). Thus, the transmission probability per partnership (βp) will be a function of the transmission probability per act (βa) and the number of unprotected acts (Na), such that: EQUATION
However, the transmission probability per sex act is likely to vary both within and between partnerships. This variation seems to be true for HIV, where some partnerships have a high transmission probability per sex act and others have a low transmission probability per sex act. Condoms use in the former partnership will have a greater impact than use in the latter partnership; likewise, condoms used in partnerships in which one partner is more likely to be infected will have a greater impact. The impact of condom use on individual risk and this translation to population‐level risk depend on these nonlinear processes.
The relationship between a given reduction in one of the parameters (i.e., partner change or partner numbers, transmission probability through condom use or by removing STDs from the population) and the incidence or prevalence of HIV will depend on the other important variables. For populations in which there exists a large percentage of infected persons, a change in βp may have a small impact. However, near the threshold reproductive rate, Ro = 1, the same change in βp would have a substantial impact.
Because there is great heterogeneity in sexual‐risk behavior, the prevalence of a given STD will saturate in a limited fraction of the population. However, the prevalence of an infection will depends on its biology (i.e., transmission likelihood and duration of infectiousness) and the pattern of choice of sexual partners within the population. There is no simple relationship between the distribution of different STDs, including HIV.
Thus, in interpreting how an intervention will work, models cannot provide simple, exact quantitative predictions with absolute certainty. Models are, by definition, tools for analysis under conditions of uncertainty. They help to identify potentially effective interventions, explain the outcomes of these interventions, and suggest designs for future interventions. Models can also be used to bound estimates of the likely effects of interventions implemented more widely in the field. The roles of models are largely qualitative, and can be greatly enhanced by using sensitivity analyses. Models can play a useful role, but they are a supplement to, rather than a replacement of, studies of the impact of interventions. Although we have only noted two general model types for illustrative purposes, different approaches in modeling have been developed and can often complement each other. Models should continue to be developed so that they capture observed complexity, and extant models should be subjected to further empirical verification. As the impact of more interventions is evaluated and longitudinal cohort data are available, models can be better validated and the reliability of the predictions can be improved so that these models may play a role in the future assessment of programs.
What Further Work Needs to Be Examined to Provide Definitive Answers to These Questions?
* Measurement of Self‐Reported Behavior Change
* Develop more appropriate measures of social desirability that have wider applicability with the experiences of a variety of populations.
* Examine the dynamics of self‐reports of behavior in dyads (e.g., men tend to overreport sexual activity, whereas women tend to underreport sexual activity).
* Develop a standard set of questions with appropriate qualifiers that can be used to assess HIV‐related an STD‐related risk behaviors.
* Develop multiple questions and alternate formats that can be used for different periods and that incorporate recall cues to improve accuracy.
* Identify the impact of using different periods to accurately capture the different patterns of risk behavior.
* Examine issues of accuracy and truthfulness of self‐report measures across populations and their relationship to the generalizability of findings.
* Identify the impact on accuracy when using contextual cues.
* Conduct intensive studies of properties of rating scales (e.g., use of adverbs, points and anchors selected for scales).
* Conduct research investigating methodologies used in areas other than sexual behavior where there is a “gold standard,” and develop ways to import these methodologies into sexual behavior research.
* Develop methods to assess correct and consistent condom use.
* Examine different data collection formats (e.g., CASI, other new technologies) and their impact on responses.
* STD and HIV
* Assess the relationship between STDs and the acquisition of HIV.
* Develop cheaper, more accurate, and more rapid STD and HIV diagnostic tests.
* Study functional relationships among behavior, STDs, and HIV; examine the difference between individual and aggregate relationships among different behaviors, types of partners, and correct and consistent condom use.
* Modeling of Epidemics
* Develop better cohort data on which to verify models of HIV and STD epidemics.
* Conduct further studies of the relationship between behaviors and STD and HIV incidence to validate and refine models of the effectiveness of interventions.
* Develop better scenario‐analysis and sensitivity‐analysis methods to assist in projecting the effects of translating effective interventions more broadly to public health settings (e.g., “What would be the impact on the epidemic if an efficacious intervention was widely adopted in a public health setting?”)