Secondary Logo

Journal Logo

Supplement Article

Evaluating Nonresponse Weighting Adjustment for the Population-Based HIV Impact Assessment Surveys on Incorporating Survey Outcomes

Lin, Tien-Huan MSa,b; Cervantes, Flores Ismael PhDa,b; Saito, Suzue PhDa,b; Bain, Rommel PhDa,b

Author Information
JAIDS Journal of Acquired Immune Deficiency Syndromes: August 1, 2021 - Volume 87 - Issue - p S52-S56
doi: 10.1097/QAI.0000000000002636
  • Free



It is common practice in survey research to attempt to mitigate bias because of unit nonresponse by making weighting adjustments to the base weights that account for the sampled units' unequal selection probabilities. There are various methods of developing these adjustments, all of which depend on the availability of auxiliary variables available for both respondents and nonrespondents. The usual approach is to develop the nonresponse adjustments based on models that predict response propensity.1 This form of nonresponse adjustment is a general-purpose strategy that is agnostic to the outcomes of the survey. However, several researchers have made the argument that the nonresponse adjustments should be specific to each survey outcome by taking into account both the probability of response and the survey outcomes to reduce bias while controlling for variance.2 The effectiveness of this approach depends on the availability of auxiliary variables that explain response propensity and are predictive of the survey outcomes at the same time.

In this article, we take a further step in this direction by including model predictions of the actual survey outcomes in developing the nonresponse adjustments, instead of relying on auxiliary variables that may be related to the survey outcomes. Vartivarian and Little (2003)3 and, more recently, Morral, Gore, and Schell (2014) and Fay and Riddles (2017) have applied this approach. However, little work has been performed in implementing this approach with stratified multistage sample designs. This article explores empirically two methods of applying this approach using data collected in four African surveys that are part of the Population-based HIV Impact Assessment (PHIA) project. The PHIA surveys have several phases of data collection, with nonresponse occurring at each phase. Large numbers of auxiliary variables are available at later phases from data collected in previous phases. These variables can be used to develop prediction models for both response propensity and for survey outcomes, which then can be applied in compensating for nonresponse at a later phase. The results obtained from the proposed approach are compared with those produced using the standard weighting approach as used in the PHIA surveys based only on models for response propensity.


The Population-Based HIV Impact Assessment Surveys

The expansion of antiretroviral treatment to more than 12.1 million people in sub-Saharan Africa is one of the most successful global public health programs ever undertaken.6 It is by far the largest initiative for a single disease, with the United States alone investing more than 70 billion dollars since 2002 (Avert, 2016). After a decade of the antiretroviral therapy scale-up, the PHIA project, implemented by ICAP at Columbia University in collaboration with the Ministries of Health, the US Centers for Disease Control and Prevention (CDC), and other partners, is assessing the status of the HIV epidemic in 14 sub-Saharan Africa countries through nationally representative surveys that measure estimates such as HIV prevalence, HIV incidence, and viral load suppression. This article uses the data from four countries in sub-Saharan Africa, with surveys concluded in 2016 and 2017.

The sample design for the four countries was a two-stage sample where in the first stage, a sample of primary sampling units was enumeration areas as defined by the last population census was selected with probability proportional estimated size. In the second stage, a systematic sample of households selected within the primary sampling unit. The sampled individuals included all eligible adults in all selected households. The data related to the sampled person were collected using 3 instruments:

  • A household interview conducted with the head of the household. The household questionnaire collected information on a range of items about the household and on each household member such as age and sex and his/her relationship with the head of the household.
  • A personal interview with each eligible adult which covered an extensive set of topics such as sexual activity, male circumcision, female reproduction, and HIV/AIDS-related knowledge and attitudes.
  • A blood sample from respondents who agreed to a laboratory HIV testing at the end of the personal interviews.

Nonresponse occurred at each of the three data collection instruments. In this article, we focus on the blood test nonresponse adjustment for sampled adults aged 15–49 years at the last stage of data collection. Countries with a blood test response rates about or lower than 90% were included in this analysis.

Table 1 displays the conditional blood test response rates (conditional on interview response) for the four countries meeting the criteria for this analysis. Male and female response rates are shown separately because men and women received different questions in their interviews (eg, questions about male circumcision and female reproduction).

TABLE 1. - Conditional Blood Test Response Rates for Adults Aged 15–49 Years by Sex for Four PHIA Countries
Sex Country
Male 86.2% 86.9% 88.3% 90.0%
Female 87.8% 86.7% 90.3% 91.9%

The Weighting Procedure Used in PHIA to Compensate for Blood Test Nonresponse

In the PHIA weighting process, blood test weights were developed for each country by first adjusting the person-level design weights for interview nonresponse, and then adjusting nonresponse-adjusted interview weights for nonresponse to the blood draw. Finally, the blood sample–adjusted weights were poststratified to national age and sex projections. See 8 for additional details.

In PHIA surveys, the nonresponse adjustments at each stage were computed as the inverses of the weighted response rates in the weighting classes. The weighting classes were created through a two-stage procedure primarily to reduce time and labor. The first stage was “feature filtering” through the Least Absolute Shrinkage and Selection Operation regression, which is a penalized or regularized regression from the field of machine learning.9 The Least Absolute Shrinkage and Selection Operation regression was implemented through the SAS procedure PROC HPGENSELECT. The second stage used the χ2 Automatic Interaction Detector (CHAID) tree classification algorithm (Magidson, 2005)10 for the final variable selection and for creating the weighting classes, implemented by a stand-alone software SI-CHAID. Both packages took into account the unequal selection probabilities and previous phase nonresponse adjustment by using survey weights in the algorithms.

Alternative Weighting Procedures to Produce More Efficient Estimates for Key Outcome Variables

Our research produced two alternative blood test nonresponse-adjusted weights that incorporated survey outcomes into the blood test nonresponse adjustment. The analyses presented in this article exclude the effect of the poststratification factor. We now describe the alternative methods. Five key survey outcomes of the PHIA study were used in the two methods:

  • HIV prevalence;
  • Viral load suppression rate among all adults;
  • Percentage of people living with HIV who were aware of their HIV status;
  • Percentage of people living with HIV who knew their HIV status and received sustained antiretroviral therapy; and
  • Percentage of people living with HIV who knew their HIV status, received sustained antiretroviral therapy, and had viral load suppression.

The last three outcomes are part of the indicator of the country's treatment target for the HIV epidemic. The methods described in this section are implemented separately by sex, for reasons discussed in section 3.1.

Joint Classification by Response Propensity and Predictive Mean Stratification

The joint classification by response propensity and predictive mean stratification method for adjusting for nonresponse is described in Vartivarian and Little.3 Their illustration of this method applies to a single survey outcome; however, they suggest options for adjusting this method to work for more outcomes (eg, using a principal component analysis to reduce the number of variables). The first step, response propensity stratification, uses logistic regression to model and predict response propensities for all sampled cases. The predicted response propensities are then grouped to form a set of propensity strata. The second step, predictive mean stratification, models the survey outcome for respondents using regression analysis. The fitted model is used to predict the survey outcome for both respondents and nonrespondents. Similar to the response propensity stratification, the predicted survey outcomes are grouped into a set of strata based on the predictions. The final step forms nonresponse adjustment cells as the cross-classification of the two sets of strata to take advantage of both response and outcome models.

For our analysis, we modified this approach in three ways. The first modification was to use the response propensity from the same weighting cells we used for the PHIA blood test nonresponse adjustment instead of modeling a new response propensity from a regression model to save time. The second modification was to use a principal component analysis to reduce the number of survey outcome variables to a smaller set of uncorrelated principal components11 (Pearson, 1901)12 (see, for example Rao, 1964 and Morrison, 1976).11,13 We implemented this analysis using the SAS procedure PROC PRINCOMP. For each analysis group, the number of outcome variables was reduced from five to two, retaining, on average, 90 percent of the total variance.

We used the SAS procedure PROC GLM to implement the predictive mean stratification. The predictors in the GLM regression model were filtered by the SAS procedure PROC GLMSELECT with forward selection with an initial model that include all auxiliary variables from the household and person interviews. The interview weights were used in both procedures to account for unequal selection probability and interview nonresponse.

The third modification was to replace the cross-classification of predicted mean and propensity strata by a cluster analysis known as the k-means model, where the cluster centers are the means of the observations assigned to the cluster. The cluster analysis was implemented using the SAS procedure PROC FASTCLUS, and the number of clusters or cells depended on the number of the blood test nonresponse adjustment cells of the regular PHIA weighting process. These cells were used as weighting classes for computing the blood test nonresponse adjustments for the joint-classification weights.

Two-Step Approach with Gradient Boosting

The second alternative weighting method is an application of the work of Morral, Gore, and Schell (2014) and Fay and Riddles (2017), labeled as the two-step approach by Fay and Riddles.5

In the first step, separate models were fitted for each of the five key survey outcomes using the respondent's household and person interview variables to predict the key outcomes for both respondents and nonrespondents. Both Morral, Gore, and Schell (2014) and Fay and Riddles5 used a machine-learning algorithm known as gradient boosting (GB) method that fits a prediction model consisting of an ensemble of weak prediction models (based on classification trees). The predictions are based on a “committee” formed from the weak predictions.14

Morral, Gore, and Schell (2014) applied the algorithm with the xgb package in R, and Fay and Riddles (2017)5 used both xgb and the R package xgboost (Chen et al, 2018). We developed our models using xgboost with cross-validation to avoid overfitting. The models for the five outcomes from the GB algorithm were used to predict the outcomes for respondents and nonrespondents. In the second step, a GB model for response propensity was fitted using the five predicted survey outcomes for respondents and nonrespondents. The predicted response propensities were grouped by percentiles to form weighting classes for the two-step weights. The interview nonresponse-adjusted weights were then adjusted for blood test nonresponse by the inverse of the weighted response rates within these weighting classes.


In this section, we compare the estimates and various statistics for the blood test nonresponse-adjusted weights created using the PHIA, joint-classification, and 2-step methods. First, we investigate the differences in estimates and variances computed using the weights from each method. We then compare the design effects of the estimators.

Assessing Differences in Estimates and Variances

Table 2 shows the unadjusted blood test estimates (weighted by interview nonresponse-adjusted weight, before adjusting for blood test nonresponse), and the blood test nonresponse-adjusted estimates by the three weighting methods (PHIA, joint-classification, and two-step) by sex and country for selected survey outcomes.

TABLE 2. - Estimates of Key Survey Outcomes by Country, Sex, and Weighting Method, Unadjusted and Adjusted for Blood Test Nonresponse
Sex Method A B C D
HIV prevalence rate of adults 15–49 y old
 Male Unadjusted 19.4 8.1 8.8 11.4
PHIA 18.8 8.1 8.5 10.9
Joint-classification 18.7 7.7* 8.4 10.9
Two-step 18.4* 7.6* 8.4 10.8*
 Female Unadjusted 30.7 13.2 15.0 17.1
PHIA 29.2 12.6 14.4 16.4
Joint-classification 29.2 12.3* 14.3 16.3
Two-step 29.1 12.5 14.3* 16.3
Percentage of HIV-positive adults 15–49 y old who are aware of their HIV status
 Male Unadjusted 68.9 66.3 59.9 68.0
PHIA 68.0 65.8 58.8 66.4
Joint-classification 67.5 63.9* 58.3 66.4
Two-step 67.0* 64.0* 58.0* 66.0*
 Female Unadjusted 82.1 76.4 68.9 76.9
PHIA 81.0 75.7 67.3 75.7
Joint-classification 80.8 74.9* 67.2 75.6
Two-step 80.9 72.5* 67.2 75.2*
Percentage of HIV-positive adults 15–49 y old who received sustained antiretroviral therapy
 Male Unadjusted 60.6 55.9 50.2 56.8
PHIA 59.9 55.5 49.4 55.4
Joint-classification 59.5 53.6* 49.0 55.4
Two-step 58.9* 54.0* 48.6* 55.1
 Female Unadjusted 73.8 69.8 57.7 66.2
PHIA 72.7 69.1 56.3 65.1
Joint-classification 72.3* 68.7* 56.3 65.0
Two-step 72.5 66.4* 56.3 64.9*
Percentage of HIV-positive adults 15–49 y old who received sustained antiretroviral therapy and achieved viral load suppression
 Male Unadjusted 52.6 49.6 43.4 46.6
PHIA 51.9 49.5 42.8 45.4
Joint-classification 51.6 47.7* 42.4 45.4
Two-step 51.2* 47.9* 42.0* 45.2
 Female Unadjusted 64.3 64.1 51.4 57.4
PHIA 63.3 63.4 50.0 56.4
Joint-classification 63.0* 63.0 50.1 56.4
Two-step 63.1 61.0* 50.1 56.3
*Difference to PHIA estimate is statistically significant at α = 0.05% level.

Overall, the weighted estimates were lower from the unadjusted estimates, suggesting that all three weighting methods corrected for bias. However, the differences were small. This bias was because the blood-test response rates for these countries were high, and hence, the nonresponse adjustment did not have a large impact. For example, for the HIV prevalence rate, the PHIA method reduced the estimate by an average of 0.5 percentage points for males and 0.97 percentage points for females across countries. For the joint-classification method, the reduction was moderately larger, with an average of 0.5 percentage points for males and 0.98 for females. The largest reduction appeared in males from the two-step method, with an average of 0.6 percentage points. On the contrary, females from the two-step method only have a reduction of 0.95 percentage points.

The differences between the PHIA estimates and the alternative weights were much smaller. For the HIV prevalence rate, the average difference between the joint-classification method and the PHIA method was 0.15 percentage points for males and 0.13 for females across countries. The average difference between the two-step method and PHIA method was larger for males (0.28% points) but smaller for females 0.1 percentage points). The statistical test (The statistical test takes into account the high correlation between the estimates. That is, the estimates are based on the same data and weighting components except for the blood test adjustment.) for the differences between the PHIA estimates and the other methods showed no differences in most of the cases. For the other survey outcomes, the differences in percentage points in the estimates among the unadjusted, PHIA, joint-classification, and the two-step were higher, but the same pattern as described above holds. These small differences indicate that all three methods correct for bias in a similar fashion.

Design Effects for Selected Estimates

In this section, we compare the design effects of the estimates of the survey outcomes for the three sets of nonresponse-adjusted weights. We expected smaller values of the design effects for the alternative weighting methods since these methods targeted only the nonresponse bias of the key survey outcomes, thus reducing the variability of the weights. Table 3 shows the design effect of three survey outcomes by sex, weighting method, and country. The average reduction of the design effect of the HIV prevalence rate with respect to the joint-classification estimates to the PHIA estimates was 0.05 for males and females across countries. Although with smaller reductions in design effects, the same pattern held for the other survey outcomes for the joint-classification method. These results matched our expectations. However, the extent of improvement in efficiency among the survey outcomes was not the same. The reason was that a weight that was efficient for 1 variable was not necessarily efficient for another since the efficiency depended on the correlation of the weights and the outcome. Targeting weights for 1 survey outcome improves the efficiency of that estimate, but it may decrease the efficiency of the other outcome if the two survey outcomes are not correlated.

TABLE 3. - Design Effects for Selected Estimates for Blood Test Nonresponse-Adjusted Weights by Country, Sex, and Weighting Method
Sex Method A B C D
HIV prevalence rate of adults 15–49 y old
 Male PHIA 1.07 1.25 1.29 1.28
Joint-classification 1.02 1.19 1.20 1.28
Two-step 0.99 1.16 1.22 1.32
 Female PHIA 1.16 1.76 1.46 1.19
Joint-classification 1.13 1.75 1.43 1.20
Two-step 1.13 1.78 1.44 1.21
Percentage of HIV-positive adults 15–49 y old who are aware of their HIV status
 Male PHIA 0.91 1.38 1.32 1.37
Joint-classification 0.90 1.34 1.35 1.36
Two-step 0.91 1.37 1.37 1.42
 Female PHIA 1.10 1.09 1.34 1.15
Joint-classification 1.10 1.05 1.34 1.16
Two-step 1.11 1.16 1.36 1.18
Percentage of HIV-positive adults 15–49 y old who received sustained antiretroviral therapy
 Male PHIA 0.94 1.36 1.35 1.18
Joint-classification 0.93 1.34 1.36 1.17
Two-step 0.92 1.38 1.38 1.23
 Female PHIA 1.18 1.25 1.47 1.23
Joint-classification 1.20 1.18 1.40 1.18
Two-step 1.20 1.27 1.45 1.22

The design effect of the two-step method for the HIV prevalence rate of males showed a gain in efficiency. However, this pattern did not hold for females and other survey outcomes presented in the table. These results suggested that the 2-step method produced less efficient estimates than the joint-classification method (although the differences in estimates are not significant). Additional research is needed to understand the role of the models in the two-step method to explain the loss of efficiency.


In this article, we explored the notion of producing efficient estimates (ie, estimates smaller variances) by including key survey outcomes in the response propensity models. We implemented two alternative weighting methods on data from four countries of the PHIA surveys. The first method was an expansion on Vartivarian and Little's3 joint classification by response propensity and predictive mean stratification method. The second method is an application of a machine-learning algorithm studied by several researchers.

The results of our analyses showed that all three methods adjust the estimates downward compared with the unadjusted estimates and that there was little difference among estimates produced by the alternative weighting methods and the PHIA estimates. In terms of the design effects of the estimates, the joint-classification method produced more efficient estimates compared with the PHIA method. This observation does not hold for the two-step method. Additional research is needed to understand the role of the models and the algorithm in this approach.

There are some limitations when developing nonresponse-adjusted weights from alternative methods. The joint-classification method is more time-consuming than the PHIA method since, in addition to modeling response propensity, it also requires modeling of survey outcomes and cluster analysis to create weighting adjustment cells. On large-scale multicountry studies such as PHIA, where time and budget are of the essence, this can be an important driving factor.

The xgboost15 package of the two-step approach included numerous parameters that needed to be “tuned.” Therefore, the weighting adjustments may not be robust to the parameters. However, the main drawback of the two-step method is that the algorithm is seen as a black box, and there is no easy way to describe the model and the importance of the selected variables.

The weights produced by the alternative weighting methods were useful as evaluation tools of the public-use weights (ie, PHIA weights). As an evaluation of the PHIA weights, the results presented in this article suggest that the PHIA weights, which do not take into account the outcome variables, perform well compared with weights derived for key survey outcomes. Note that we do not advocate the use of weights developed for specific outcomes. These weights would produce efficient estimates for those variables correlated to the outcome, but they are inefficient for those that are not. As a multipurpose weight, the efficiency of the estimates produced by the PHIA weighting method closely resembles those specifically targeted at key survey outcomes.


The authors thank Dr. Graham Kalton for his comments and support of this work.


1. Brick J, Kalton G. Handling missing data in survey research. Stat Methods Med Res. 1996;5:215–238.
2. Little RJ, Vartivarian S. Does weighting for nonresponse increase the variance of survey means? Stat Canada. 2005;31:161–168.
3. Vartivarian S, Little RJ. On the formation of weighting adjustment cells for unit nonresponse. The University of Michigan Department of Biostatistics Working Paper Series [serial online]. August, 2003; Working Paper 10. Available at: Accessed September 20, 2018.
4. Sexual Assault and Sexual Harassment in the U.S. Military: Volume 1. Design of the 2014 RAND Military Workplace Study: Morral A, Gore K, Schell T, eds. Santa Monica, California: RAND Corporation; 2014. Available at: Accessed September 20, 2018.
    5. Fay RE, Riddles MK. One-versus two-step approaches to survey nonresponse adjustments. Presented at: Joint Statistical Meeting; 2017; Baltimore, MD. Accessed September 20, 2018.
    6. Global AIDS Update. UNAIDS, May 31, 2016. Available at: Accessed September 20, 2018.
    7. Funding for HIV and AIDS. Avert, My 25, 2016. Available at: Accessed September 20, 2018.
    8. Lin T, Weil N, Flores Cervantes I, et al. Developing nonresponse weighting adjustments for population-based HIV impact assessments surveys in three African countries. Presented at: Joint Statistical Meeting; 2017; Baltimore, MD. Available at: Accessed September 20, 2018.
    9. Tibshirani R. Regression shrinkage and selection via the LASSO. J R Stat Soc. 1996; 58: 267–288.
    10. SI-CHAID 4.0 User's Guide. J. Magidson: Statistical Innovations Inc.; 2005. Available at: Accessed September 20, 2018.
    11. Morrison D. “Multivariate Statistical Methods.” 2nd ed. New York, NY: McGraw-Hill, 1976.
    12. Pearson K. On lines and planes of closest fit to systems of points in space. Philos Mag. 1901;6:559–572.
    13. Rao CR. The use and interpretation of principal component analysis in applied research. Sankhya A. 1964;26:329–358.
    14. Hastie T, Tibshirani R, Friedman JH. The Elements of Statistical Learning. 2nd ed. New York: Springer; 2009.
    15. xgboost: Extreme Gradient Boosting [computer Program]. Version 0.71.2: Chen T, He T, Benesty M, Khotilovich V, Tang Y, Cho H, Li Y, eds. Cran R; 2018.

    PHIA; nonresponse adjustment; principal component analysis; cluster analysis; gradient boosting

    Copyright © 2021 Wolters Kluwer Health, Inc. All rights reserved.