Disparities research in dementia is limited by lack of large, diverse, and representative samples with systematic dementia ascertainment. Algorithmic diagnosis of dementia offers a cost-effective alternate approach. Prior work in the nationally-representative Health and Retirement Study (HRS) has demonstrated that existing algorithms are ill-suited for racial/ethnic disparities work given differences in sensitivity and specificity by race/ethnicity.
We implemented traditional and machine learning methods to identify an improved algorithm that (a) had ≤5 percentage point difference in sensitivity and specificity across racial/ethnic groups, (b) achieved ≥80% overall accuracy across racial/ethnic groups, and (c) achieved ≥75% sensitivity and ≥90% specificity overall. Final recommendations were based on robustness, accuracy of estimated race/ethnicity-specific prevalence and prevalence ratios compared to those using in-person diagnoses, and ease of use.
We identified six algorithms that met our pre-specified criteria. Our three recommended algorithms achieved ≤3 percentage point difference in sensitivity and ≤5 percentage point difference in specificity across racial/ethnic groups, as well as 77%-83% sensitivity, 92-94% specificity, and 90-92% accuracy overall in analyses designed to emulate out-of-sample performance. Pairwise prevalence ratios between non-Hispanic whites, non-Hispanic blacks, and Hispanics estimated by application of these algorithms are within 1% to 10% of prevalence ratios estimated based on in-person diagnoses.
We believe these algorithms will be of immense value to dementia researchers interested in racial/ethnic disparities. Our process can be replicated to allow minimally biasing algorithmic classification of dementia for other purposes.
1Department of Epidemiology, Milken Institute School of Public Health, George Washington University
2Department of Biostatistics and Bioinformatics, Milken Institute School of Public Health, George Washington University
Source of Funding: The results reported herein correspond to specific aims of grant R03 AG055485 to MCP from NIH. This work was also supported by grant K01 MH113850 to AC from NIH.
The Health and Retirement Study data is sponsored by the National Institute on Aging (grant number U01AG009740) and was conducted by the University of Michigan.
Conflicts of interest: None declared
Acknowledgments: The authors are grateful to Erin Bennett and Xiang Li for excellent administrative and editorial support.
Data availability and Reproducibility: The data used in this study are available on the Health and Retirement Study website (http://hrsonline.isr.umich.edu/). SAS code for reproducing our datasets and assigning algorithmic diagnoses will be available by the time of publication on: https://github.com/powerepilab/AD_Algorithm_Development.
Editor’s Note: A related article is found on p. XXX.
Corresponding author: Kan Z. Gianattasio, 950 New Hampshire Ave NW, 5th Floor, Washington DC 20052, T: 202.994.2572, E: firstname.lastname@example.org