Population-level estimates of disease prevalence and control are needed to assess prevention and treatment strategies. However, available data often suffer from differential missingness. For example, population-level HIV viral suppression is the proportion of all HIV-positive persons with suppressed viral replication. Individuals with measured HIV status, and among HIV-positive individuals those with measured viral suppression, likely differ from those without such measurements.
We discuss three sets of assumptions to identify population-level suppression in the intervention arm of the SEARCH Study (NCT01864603), a community randomized trial in rural Kenya and Uganda (2013–2017). Using data on nearly 100,000 participants, we compare estimates from (1) an unadjusted approach assuming data are missing-completely-at-random (MCAR); (2) stratification on age group, sex, and community; and (3) targeted maximum likelihood estimation to adjust for a larger set of baseline and time-updated variables.
Despite high measurement coverage, estimates of population-level viral suppression varied by identification assumption. Unadjusted estimates were most optimistic: 50% (95% confidence interval [CI] = 46%, 54%) of HIV-positive persons suppressed at baseline, 80% (95% CI = 78%, 82%) at year 1, 85% (95% CI = 83%, 86%) at year 2, and 85% (95% CI = 83%, 87%) at year 3. Stratifying on baseline predictors yielded slightly lower estimates, and full adjustment reduced estimates meaningfully: 42% (95% CI = 37%, 46%) of HIV-positive persons suppressed at baseline, 71% (95% CI = 69%, 73%) at year 1, 76% (95% CI = 74%, 78%) at year 2, and 79% (95% CI = 77%, 81%) at year 3.
Estimation of population-level disease burden and control requires appropriate adjustment for missing data. Even in large studies with limited missingness, estimates relying on the MCAR assumption or baseline stratification should be interpreted cautiously.