To the Editors:
In their March 2011 editorial in J Acquir Immune Defic Syndr., Padian et al1 describe the implementation science framework for research on programs under the President's Emergency Plan for AIDS Relief (PEPFAR), as proposed by the Office of the US Global AIDS Coordinator (OGAC). In their article, they call for more rigorous evaluation of interventions to inform program scale-up. We agree with the need for more and better program evaluations to inform HIV prevention efforts worldwide. However, we believe some of the evaluation approaches promoted in the OGAC framework need to be placed within a larger context.
The authors describe 3 components of implementation science: monitoring and evaluation “for routine daily assessment”, operations research (for “increasing the efficiency of implementation”), and impact evaluation (for “causal attribution of observed changes”). In the latter category, the authors promote the application of randomized experimental designs, enabling causal attribution in part because such designs compare results against what would have happened had the program not been implemented (the program counterfactual). They allow some modifications to strict randomized trials, including quasi-experiments that have a counterfactual in the absence of randomization, and phased implementations (also known as stepped wedge designs), which amount to randomization based on time of program implementation.
THE CHALLENGES OF RANDOMIZED EXPERIMENTS OF HIV PREVENTION
A year earlier, Padian et al2 conducted a review of the literature for randomized controlled trials (RCTs) of HIV prevention. They found 37 articles fitting their criteria. Only 6 established definitive results; of those, 5 showed a positive effect of the intervention. Although they considered interventions of all types, the definitively positive interventions were all biomedical as follows: 3 of male circumcision, 1 of STI management, and 1 of a vaccine.
The authors note that failures to yield definitive results are most often attributable to the study design. Study characteristics they described that favor a successful RCT include the following: an intense application of the intervention; a new intervention that is not available to the controls; prevention services in the comparison group that have not been augmented for the trial; a community where infection rates are high and not in decline; sustained adherence to the intervention or its absence; and execution of the intervention over a long period of time, “as long as a decade.” Of course, a trial has the greatest hope for success if all of these are in place. But there are a number of reasons why they can not or, in some situations, should not be.
The authors point out that interventions implemented with high intensity and high adherence are not sustainable and thus do not reflect their day-to-day implementation outside of a study. The interventions most able to be implemented with high intensity and adherence are biomedical. Behavioral and structural interventions are often harder to control and their positive effects can be less dramatic than a biomedical intervention, though they remain critically important because they can address issues not touched by biomedicine. Structural interventions, such as laws and economic policies, affect whole populations, produce a diverse set of results, and often require long time frames to demonstrate effects.
The intensity of biomedical interventions is further challenged by ethical standards. After the outcry over placebo-controlled azidothymidine trials in Thailand, the US National Bioethics Advisory Council recommended that clinical trials “provide members of any control group with an established effective treatment, whether or not such treatment is available in the host country”.3 In some cases, an established effective treatment will be far superior to the local standard, perhaps approaching the efficacy of the intervention under study, making statistical significance of differences harder to achieve. Others point to the ubiquity of health-related interventions and the absence of communities lacking one.4 Most have several intervention programs underway, making it much harder to demonstrate the effectiveness or added value of another intervention.
In sum, then, randomized experiments are suitable and feasible for only a small portion of HIV interventions.
Not only are interventions seldom acting alone in a community, the desire to evaluate interventions is not the sole agenda of PEPFAR and the US government more broadly. In 2009, the Obama administration announced the Global Health Initiative (GHI) as its overarching approach to improving global health. Projects that were funded under PEPFAR before GHI are now expected to come into line with the GHI 7 principles. Two of the principles are to increase impact through strategic coordination and integration; and to encourage country ownership and invest in country-led plans. Integrating programs with each other runs contrary to the RCT strategy of singling out an element and measuring its sole effect. If a program that is an integration of 2 others is evaluated and found to be effective, discerning whether the results should be attributed to just one of the programs or both of them together adds a layer of complexity with associated research costs.
A country-led plan for HIV/AIDS prevention may be quite different from a plan designed by a researcher. Typically a ministry of health is trying to allocate scarce resources and make choices between a dizzying array of options and permutations in a variety of settings. A researcher, on the other hand, typically wants to assess the impact and value of a single intervention. Although a ministry of health would ideally like to know the effectiveness of each program, it is unlikely to stop all other programs in 1 or several settings to assess the effects of just one. But even if it were to do this, the results of the trial may not apply to settings where other existing programs could interfere with the mechanisms of the intervention in question.
These types of challenges in implementing evaluation designs in real world settings are well documented, most recently by Victora et al.4 To address these challenges they proposed an “evaluation platform design” which uses the health district as the unit of analysis and takes into account multiple existing programs along with other contextual factors. The design is observational rather than experimental and uses multiple existing sources of data, such as monitoring systems, routine surveillance data, and population and facility-based surveys. Empirical data collection is used to fill any data gaps. In contrast, to a randomized experimental design, which aims for causal attribution, the goal of the evaluation platform design is plausible connections between programs and outcomes. This approach was 1 of 4 outlined by gathered representatives from United States Agency for International Development (USAID), OGAC, and the Centers for Disease Control and Prevention during a technical consultation on innovative approaches to evaluation within the Global Health Initiative.5
A plausibility approach is also promoted by UNAIDS and is being applied in current evaluations of the President's Malaria Initiative.6,7 Such an approach is consistent with a broader vision for monitoring and evaluation that establishes an a priori approach to analyzing the outputs and outcomes of PEPFAR programs, as supported by Padian et al. Boerma and Stansfield8 took this approach one step further, arguing for a systemic enhancement of national health information systems for monitoring and evaluation. Operationalization of such an approach has its own set of challenges, however. Augmenting routine data collection to enable evaluations is an intervention in its own right that would take several years to realize and would potentially substantially increase an already high reporting burden on country health systems. Moreover, the added work of data collection would result in less time for data analysis and use unless accompanied by additional personnel and systems. We recommend the use of existing data, progressive improvement of data collection and use depending on country context and capability, and targeted additional data collection as needed.
In recent years, USAID has given explicit attention to evaluation by creating an Office of Learning, Evaluation, and Research and publishing the Agency's evaluation policy.9 This attention is welcome and much needed. OGAC's guidance on implementation science is further evidence of the US government's commitment to evaluation and evidence-based practices and policies. Within the discipline of evaluation, there is widespread recognition of challenges presented in “real life” by a diverse mix of simultaneous HIV prevention activities in virtually every community, multiple agendas in HIV prevention, and resource constraints. To accommodate these realities, a range of evaluation approaches has been affirmed by national and international evaluation organizations. The use of randomized experiments to assess the impact of HIV interventions promoted in the OGAC implementation science guidance is part of the array of evaluation options. However, they may be applicable in relatively few situations and a number that is likely decreasing because of ever broadening coverage of populations with HIV prevention activities. Some of the authors of the OGAC guidance participated in the creation of other documents mentioned here, including the UNAIDS Strategic Guidance for Evaluating HIV Prevention Programs. Thus, they are aware of the challenges and common recommendations. Our interest, then, is to ensure that J Acquir Immune Defic Syndr. readers understand the broader context into which the OGAC guidance is introduced and to recognize where their recommendations for impact evaluation fall in the range of evaluation approaches endorsed by other sources of guidance.
James C. Thomas, MPH, PhD*§
Sian Curtis, PhD†§
Jason B. Smith, PhD, MPH‡§
*Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC †Department of Maternal and Child Health, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC ‡Department of Health Behavior and Health Education, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC §Carolina Population Center, University of North Carolina, Chapel Hill, NC
1. Padian NS, Holmes CB, McCoy SI, et al. Implementation science for the US President's Emergency Plan for AIDS Relief (PEPFAR). J Acquir Immune Defic Syndr
2. Padian NS, McCoy SI, Balkus JE, et al. Weighing the gold in the gold standard: challenges in HIV prevention research. AIDS
3. National Bioethics Advisory Council (NBAC). Ethical and Policy Issues in International Research: Clinical Trials in Developing Countries
. Vol I. Bethesda, MD: National Bioethics Advisory Council; 2001:28.
4. Joint United Nations Programme on HIV AIDS (UNAIDS). Victora CG, Black R, Boerma JT, et al. Measuring impact in the millenium development goal era and beyond: a new approach to large-scale effectiveness evaluations. Lancet
6. Rowe AK, Steketee RW, Arnold F, et al. Evaluating the impact of malaria control efforts on mortality in sub-Saharan Africa. Trop Med Int Health
8. Boerma JT and Stansfield SK. Health statistics now: are we making the right investments? Lancet