In his commentary, Timothy Cole1 praised the superiority of partial least squares regression over ordinary least squares regression in dealing with highly or perfectly collinear covariates, such as the repeated measurements of body size throughout the lifecourse. As shown in our paper,2 partial least squares regression with only the first few components yields parsimonious models, producing smoother curves in the lifecourse plots. There are 2 important advantages of partial least squares over traditional ordinary least squares regression for lifecourse research: first, the covariate matrix need not be full rank in partial least squares analysis, which provides many new statistical models that are otherwise unthinkable, as demonstrated by our paper; second, like principal-components analysis or ridge regression, partial least squares provides a trade-off between bias and precision. It is well known that confidence intervals of ordinary least squares regression coefficients increase with increasing collinearity among covariates,3 ie, in the long run, the ordinary least squares estimates remain unbiased, but for a specific random sample, the ordinary least squares estimates can substantially deviate from the “truth.” In contrast, regression coefficients from partial least squares analysis with the first few components are only approximations to the “truth,” but for a specific random sample, these deviations may be consistently smaller than those of ordinary least squares estimates.3 The crucial question is therefore what is the optimal number of partial least squares components.
For the model reported in Figure 3A and B of our paper2 and eTable 1 in the online appendix, we reported the effect of birth weight on blood pressure as zero, based on the results from the 1-component model. Although this conclusion is in line with other models reported in our paper, Cole has questioned its robustness, as the 2-component model suggested negative effects of birth weight on blood pressure. Indeed, according to the 2-component model, not only did birth weight have negative effects, but so too did weight at 1 year. Cole's comment raises 2 interesting questions: (1) should the selection of the “ideal” partial least squares model be based entirely upon model indices, such as predictive residual sum of squares or Q2,4,5; and (2) with many different model specifications in the lifecourse research, how should potentially conflicting results be interpreted?
From a statistical perspective, predictive residual sum of squares and Q2 are useful tools to determine a parsimonious partial least squares model, but like all statistical indices, they have to be used in conjunction with our knowledge in the specific empirical research area, and models should be selected according to other criteria, such as interpretability.6,7 The differences in the effects of birth weight between 1 and 2-component models may, as argued by Cole, pose a challenge to the interpretation of partial least squares analysis, as the addition of 1 component can cause substantial changes in the interpretation of the results. However, we should not forget that partial least squares, principle-components analysis, and ordinary least squares are all within the family of continuum regression,8 and the greater the number of components selected for partial least squares models, the more similar their behaviors and estimates are to ordinary least squares models. The reversed sign for regression coefficients of birth weight and weight at 1 year is a clear warning of collinearity among the 6 weight measurements, and in the full (partial leastsquares and ordinary least squares) model, 2 or 3 of the weight measurements had negative effects on blood pressure. This phenomenon is equivalent to the reversal paradox we have discussed in previous publications,9–11 ie, a small positive effects of birth weight and weight at 1 or 2 years may be reversed when weights measured at later ages are also adjusted for within the models. In fact, even weights at age 8 and 15 years had negative or small positive effects in the 6-component models, which is equivalent to the ordinary least squares model.
So should we trust the results from the 1- or 2-component models? If we take a look at the 2-component models, birth weight and weight at 1 year had negative effects, with larger confidence intervals than those in the 1-component model, indicating the problem of collinearity. Leaving aside the fact that predictive residual sum of squares and Q2 indicate that the 1-component model is more parsimonious, the evidence of change in the signs of regression coefficients, combined with large confidence intervals, suggests that the results from models with 2 or more components suffer collinearity and need to be interpreted with caution. This also indicates that while partial least squares is a solution to the problem of collinearity among covariates, there is a limit to its ability to deal with this problem, as eventually, with increasing components, results from partial least squares regression become equivalent to those from ordinary leastsquares regression. In our opinion, the challenge for partial least squares lies perhaps not in the difference in estimates between the 1- and 2-component models in our Figure 3,2 but in the appropriate interpretation of the rich information available from the various growth models that cannot be fitted by ordinary least squares, but which pose no problem for partial least squares.
Given the complexity of statistical and mechanistic issues in lifecourse research, we believe a plethora of different approaches is needed. Hopefully, more research will confirm that partial least squares is valuable as one of them.
1.Cole TJ. Can partial least squares regression separate the effects of body size and growth on later blood pressure? [commentary] Epidemiology
2.Tu YK, Woolston A, Baxter PD, Gilthorpe MS. Assessing the impact of body size in childhood and adolescence on blood pressure: an application of partial least squares regression. Epidemiology
3.Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction.
2nd ed. New York: Springer; 2009.
4.Wold S, Sjöström M, Eriksson L. PLS-regression: a basic tool of chemometrics. Chemom Intell Lab Syst
5.Wakeling IN, Morris JJ. A test of significance for partial least squares regression. J Chemom
6.Tu YK, Gilthorpe MS, D' Aiuto F, Woolston A, Clerehugh V. Partial least squares path modelling for relations between baseline factors and treatment outcomes in periodontal regeneration. J Clin Periodontol.
7.Gilthorpe MS, Frydenberg M, Cheng Y, Baelum V. Modelling count data with excessive zeros: the need for class prediction in zero-inflated models and the issue of data generation in choosing between zero-inflated and generic mixture models for dental caries data. Stat Med
8.Stone M, Brooks RJ. Continuum regression: cross-validated sequentially constructed prediction embracing ordinary least squares, partial least squares and principal components regression. J R Stat Soc B
9.Tu YK, West R, Ellison GT, Gilthorpe MS. Why evidence for the fetal origins of adult disease might be a statistical artifact: the “reversal paradox” for the relation between birth weight and blood pressure in later life. Am J Epidemiol
10.Weinberg CR. Invited commentary: Barker meets Simpson. Am J Epidemiol
11.Tu YK, Manda SO, Ellison GT, Gilthorpe MS. Revisiting the interaction between birth weight and current body size in the foetal origins of adult disease. Eur J Epidemiol