Institutional members access full text with Ovid®

Share this article on:

Addressing the Identification Problem in Age-period-cohort Analysis: A Tutorial on the Use of Partial Least Squares and Principal Components Analysis

Tu, Yu-Kanga; Krämer, Nicoleb; Lee, Wen-Chungc

doi: 10.1097/EDE.0b013e31824d57a9

In the analysis of trends in health outcomes, an ongoing issue is how to separate and estimate the effects of age, period, and cohort. As these 3 variables are perfectly collinear by definition, regression coefficients in a general linear model are not unique. In this tutorial, we review why identification is a problem, and how this problem may be tackled using partial least squares and principal components regression analyses. Both methods produce regression coefficients that fulfill the same collinearity constraint as the variables age, period, and cohort. We show that, because the constraint imposed by partial least squares and principal components regression is inherent in the mathematical relation among the 3 variables, this leads to more interpretable results. We use one dataset from a Taiwanese health-screening program to illustrate how to use partial least squares regression to analyze the trends in body heights with 3 continuous variables for age, period, and cohort. We then use another dataset of hepatocellular carcinoma mortality rates for Taiwanese men to illustrate how to use partial least squares regression to analyze tables with aggregated data. We use the second dataset to show the relation between the intrinsic estimator, a recently proposed method for the age-period-cohort analysis, and partial least squares regression. We also show that the inclusion of all indicator variables provides a more consistent approach. R code for our analyses is provided in the eAppendix (

Supplemental Digital Content is available in the text.

From the a Division of Biostatistics, Leeds Institute of Genetics, Health & Therapeutics, and Leeds Dental Institute, University of Leeds, Leeds, United Kingdom; b Mathematical Statistics, Technische Universität München, Garching, Germany; and c Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, Taipei, Taiwan.

Submitted 4 August 2011; accepted 8 December 2011; posted 7 March 2012.

Y.K.T. was funded by the United Kingdom government's Higher Education Funding Council for England and held a UK Research Council Fellowship. N.K. was partially supported by the FP7-ICT Programme of the European Community, under the PASCAL2 Network of Excellence, ICT-216886. This project was partly funded by a grant from the UK Medical Research Council (G1000726: Methods for modeling repeated measures in a lifecourse framework) and by an international joint project grant from the Royal Society, London, United Kingdom, and the National Science Council, Taiwan. The authors reported no other financial interests related to this research.

Supplemental digital content is available through direct URL citations in the HTML and PDF versions of this article ( This content is not peer-reviewed or copy-edited; it is the sole responsibility of the author.

Correspondence: Yu-Kang Tu, Division of Biostatistics, Leeds Institute of Genetics, Health & Therapeutics, University of Leeds, Room 8.01, Level 8, Worsley Building, Clarendon Way, Leeds, LS2 9JT, United Kingdom. E-mail:

© 2012 Lippincott Williams & Wilkins, Inc.