Background: Large data sets with many variables provide particular challenges when constructing analytic models. Lasso-related methods provide a useful tool, although one that remains unfamiliar to most epidemiologists.
Methods: We illustrate the application of lasso methods in an analysis of the impact of prescribed drugs on the risk of a road traffic crash, using a large French nationwide database (PLoS Med 2010;7:e1000366). In the original case-control study, the authors analyzed each exposure separately. We use the lasso method, which can simultaneously perform estimation and variable selection in a single model. We compare point estimates and confidence intervals using (1) a separate logistic regression model for each drug with a Bonferroni correction and (2) lasso shrinkage logistic regression analysis.
Results: Shrinkage regression had little effect on (bias corrected) point estimates, but led to less conservative results, noticeably for drugs with moderate levels of exposure. Carbamates, carboxamide derivative and fatty acid derivative antiepileptics, drugs used in opioid dependence, and mineral supplements of potassium showed stronger associations.
Conclusion: Lasso is a relevant method in the analysis of databases with large number of exposures and can be recommended as an alternative to conventional strategies.