Limit of detection (LOD) issues are ubiquitous in exposure assessment. Although there is an extensive literature on modeling exposure data under such imperfect measurement processes, including likelihood-based methods and multiple imputation, the standard practice continues to be naïve single imputation by a constant (e.g.,
). In this article, we consider the situation where, due to the practical logistics of data accrual, sampling, and resource constraints, exposure data are analyzed in multiple batches where the LOD and the proportion of censored observations differ across batches. Compounding this problem is the potential for nonrandom assignment of samples to each batch, often driven by enrollment patterns and biosample storage. This issue is particularly important for binary outcome data where batches may have different levels of outcome enrichment. We first consider variants of existing methods to address varying LODs across multiple batches. We then propose a likelihood-based multiple imputation strategy to impute observations that are below the LOD while simultaneously accounting for differential batch assignment. Our simulation study shows that our proposed method has superior estimation properties (i.e., bias, coverage, statistical efficiency) compared to standard alternatives, provided that distributional assumptions are satisfied. Additionally, in most batch assignment configurations, complete-case analysis can be made unbiased by including batch indicator terms in the analysis model, although this strategy is less efficient relative to the proposed method. We illustrate our method by analyzing data from a cohort study in Puerto Rico that is investigating the relation between endocrine disruptor exposures and preterm birth.
From the aDepartment of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI
bDepartment of Epidemiology, University of Michigan School of Public Health, Ann Arbor, MI
cEpidemiology Branch, National Institute of Environmental Health Sciences, Research Triangle Park, Durham, NC
dDepartment of Environmental Health Sciences, University of Michigan School of Public Health, Ann Arbor, MI
eDepartment of Civil and Environmental Engineering, Northeastern University College of Engineering, Boston, MA
fDepartment of Epidemiology and Biostatistics, University of Georgia College of Public Health, Athens, GA.
Submitted September 28, 2018; accepted May 27, 2019.
Supported by the Superfund Research Program of the National Institute of Environmental Health Sciences (NIEHS), National Institutes of Health (NIH); grant number P42ES017198. Additional support was provided from NIH grant number P30ES017885 and the NIH Environmental influences on Child Health Outcomes (ECHO) program grant number UG3OD023251. B.M. was funded by the NSF Division of Mathematical Sciences grant number 1712933. K.F. was funded by the Intramural Research Program of NIEHS.
The authors report no conflicts of interest.
Availability of Data and Computing Code: A general implementation of censored likelihood multiple imputation, example code, and an artificial dataset can be found at: https://github.com/bossjona/Single-Pollutant-Multiple-LODs. The dataset used in the data example is not publicly available due to the sensitive nature of demographic information and biological measurements, but is available from the corresponding author on reasonable request.
Supplemental digital content is available through direct URL citations in the HTML and PDF versions of this article (www.epidem.com).
Correspondence: Sehee Kim, Department of Biostatistics, School of Public Health, University of Michigan, 1415 Washington Heights, Ann Arbor, MI 48109. E-mail: firstname.lastname@example.org.