Studies in the nursing research field very often include the collection of repeated measurements on an individual. Statistical analysis is then performed on these data to make inferences. Historically, the statistical methods used in a statistical analysis of repeated-measures data either did not make use of all of the data collected or ignored the inherent relationship between measurements taken on the same individual. For example, it was common to either discard the baseline measurement in the analysis of longitudinal data and consider only score changes from baseline in an analysis or analyze only the posttreatment score.
Times have changed. With the advent of user-friendly statistical software and accessible high-speed computing power, advanced statistical methods for properly analyzing repeated-measures data are now readily available. These methods are now being used and published in the research arena across a host of disciplines and disease areas. The purpose of this presentation is to provide a nontechnical introduction to modern advanced statistical methods for analyzing repeated-measures data.
Applying a statistical method usually requires the user to make assumptions about the data. For example, classic statistical inference methods assume that the data consist of sample values drawn from a population of interest. Further, an assumption usually is made about the way the sample values are selected. The validity of results of classic statistical methods relies on an assumption of success of implementation of a particular type of sampling technique, namely, simple random sampling (SRS), in which each individual in a population has an equal chance of being included in the study sample. In other words, SRS ensures that the sample values are representative of the population of interest. The language of statistics refers to this assumption as independence; in statistical terms, this means the correlation between observations is assumed to be zero. Classic statistical techniques, such as analysis of variance (ANOVA), linear regression, and logistic regression, assume independence between measurements. However, this assumption is violated when multiple measurements are observed for an individual in a study. For example, intervention data with a pretreatment and posttreatment measurement for each individual lead to a dependency between observations on the same individual. Other examples of dependent data include multiple data points collected over time for each individual, measurements taken on patients in the same unit (e.g., floor, wing, hospital), and measurements taken from a common cluster (e.g., family, community, school). These examples violate the needed assumption of independence for classic statistical methods because more than one measurement from the same person or unit usually indicates within-subject correlation.
Data analyses have been presented in many publications in the nursing and more general biomedical literature in recent years that have made use of advanced statistical methods for analyzing multiple measurements on an individual, namely, mixed and marginal models (Fitzmaurice & Ravichandran, 2008). In this article, the aim was to present the nonstatistician with a nontechnical understanding of these methods. Unfortunately, the language used in the literature to describe repeated-measures data is inconsistent and varies by discipline, journal, and topic (Diez Roux, 2002). For example, the term repeated-measures data may refer to longitudinal data (information collected over time), pretreatment and posttreatment data, multiple measurements per subject, clustered data by geographical unit (e.g., city, county, state), or multilevel data (e.g., patient, organization, country). For simplicity and clarity, any type of data having a plausible existence of within-subject correlation will be referred to here as repeated-measures data.
This article is organized as follows. First comes a brief overview of naïve inference, which is the use of traditional statistical methods to analyze repeated-measures data while ignoring within-subject correlation. The next section describes a modeling framework for analyzing repeated-measures data, a discussion of an approach using mixed models. Clarification is offered for the many labels given in the literature for the use of mixed models with repeated-measures data. Terminology for describing mixed models varies in the biomedical and public health literature; it is common for two identical statistical models to be cited differently. Although this article is focused on an overview of mixed models, also offered is a brief introduction to population averaged or marginal statistical models, another popular analytic modeling framework commonly used for analyzing repeated-measures data.
Repeated-measures data are a familiar data structure for most researchers (Sheu, 2002). However, it is only in the last two decades that analytical strategies have emerged that account correctly and efficiently for within-subject correlation (Cho, 2003). It had been common practice to perform a standard statistical analysis on repeated-measures data, assuming all observations to be independent. Referred to as naïve pooling (Burton, Gurrin, & Sly, 1998), multiple measurements on the same individual implies a dependency in the data, and as a result, an assumption of independence may be unreasonable and lead to misleading inferences. Conclusions drawn from statistical results using naïve pooling can differ considerably from statistical methods that account for within-subject correlation. One common method that does not make use of all of the data is averaging an individual’s repeated measurements. Often, repeated measurements on an individual may be taken to assess instrument accuracy, account for natural fluctuations, and average out effects such as time of day and environmental, social, or other conditions that are not of interest but that may detract from discernible understanding of a primary outcome. The variations that occur in repeated measurements for an individual may be due to chance or some underlying mechanism; averaging an individual’s measurements prevents a statistical analysis to determine which of these it may be.
The goal of collecting longitudinal data is typically to understand how a measurement of interest changes over time. Baseline measurements are taken at the beginning of a study. One or more interventions or treatments are administered to the study subjects. Posttreatment measurements are then taken immediately after treatment and sometimes with longer term follow-up as well. In experimental studies, random assignment is used to a treatment and control group to allow for causal inference. A common approach with longitudinal data has been to compute the differences between baseline and follow-up values (Kleinhenz et al., 1999). This approach can be problematic. For example, it is common practice to perform statistical tests to compare baseline values for the study variables (demographics, outcomes) to confirm that random assignment has yielded similar groups. If the result of a statistical test shows a significant difference between the groups at baseline, this difference needs to be controlled for in a statistical analysis looking at time effects of the treatment (Vickers & Altman, 2001).
A calculation of the percentage change from baseline to change score has the same issues (Anderson et al., 2000). Inferences made with percentage change fail to account for baseline measurements or differences and may be misleading, especially in situations where the baseline characteristics for two or more groups are different. Although randomization ideally provides a data structure that avoids baseline differences, small sample sizes or practical considerations in practice often lead to baseline differences and need to be accounted for in the analysis (Vickers, 2001; Vickers & Altman, 2001).
Another common approach that has been used is to present an analysis of only the last observed posttreatment score. An argument often used to rationalize this practice is that if baseline values for two or more groups are not statistically different, examining only the posttreatment score is sufficient. This is inadvisable because the baseline values likely will still vary, even if not significantly, and effects of this variation will be lost if only using the posttreatment score.
Statistical Methods With Naïve Pooling
Statistical methods and models that assume independence are inappropriate in the presence of correlated observations because ignoring the within-subject correlation can mask a clear understanding of the phenomenon under study. Statistical inference with classic methods such as linear regression, ANOVA, analysis of covariance (ANCOVA), logistic regression, and log-linear models each assume that the data consist of a SRS taken from a population of interest and the observations are independent (i.e., not correlated). Two observations are said to be independent if knowing the result of the first observation tells nothing about the second observation. Repeated-measures data are not independent. Summary statistic measures mentioned previously often are used as dependent variables with one of these statistical methods. This is problematic, as these inferential procedures in their usual form do not allow for dependency between observations. The violation of the independence assumption leads to inaccurate and inflated standard errors, unnecessarily wider confidence intervals, and misleading results (Diggle, Heagerty, Liang, & Zeger, 2002). A statistical analysis that does not account for within-subject correlation fails to represent the uncertainty correctly, and this deficiency is reflected in the results and subsequent inferences.
Statistical Models for Repeated-Measures Data
Two classic statistical methods have been used to analyze repeated-measures data, the paired t test and ANOVA with repeated measures. Each assumes a continuous dependent variable and categorical independent variables. The paired t test is used to test for a mean difference in paired measurements. This method is limited because it does not allow for consideration of covariates and is restricted to two repeated measurements. The other method is ANOVA with repeated measures. This method does account for within-subject correlation and accommodates a finite number of repeated measurements, but it has several limitations. First, it requires a balanced design, which means there are a finite number of categories for which repeated measurements are defined, and repeated measurements must be observed at the same time points or under the same conditions for each subject (e.g., follow-up measurements at 3, 6, 9, and 12 months; daily blood sugar measurements taken for 7 days). If an individual is missing one or more measurements, the individual is dropped completely from the analysis. Individuals missing one or more repeated measures contain partial information about the study measures, and those with missing partial data may be different from those who have complete data, leading to a bias in the statistical analysis. This can lead to misleading conclusions and, eventually, to incorrect decisions, which may have a negative real-world impact (Diggle et al., 2002; Twisk & de Vente, 2002).
Another limitation of ANOVA with repeated measures is the lack of easily interpretable results. The ANOVA output is usually in the form of statistical measures such as sum of squares, degrees of freedom, mean squares, and p values. Although post hoc test results can be used, the conventional ANOVA results lack parameter estimates like those that result from linear regression. The statistics calculated are not easily interpretable and lack units of the original measurements. In addition, if a statistically significant difference is discovered, the statistics calculated do not provide the researcher with the magnitude of an effect or change. Regardless of statistical significance and display of a p value, it is essential for a researcher to report a measure of effect size with any statistical analysis (Hayat, 2010). McCulloch (2005) suggested that ANOVA with repeated measures is an outdated statistical technique and should no longer be used in the analysis of repeated-measures data.
Other classic statistical modeling techniques used for repeated-measures data include multivariate ANOVA and multivariate ANCOVA. These methods are an improvement over the traditional repeated-measures ANOVA and ANCOVA methods because they allow for multiple dependent variable measurements. However, these methods are also outdated, due to inefficient treatment of missing data. If one or more measures on a subject are missing, all data for that subject are dropped from the analysis.
There are two well-developed statistical modeling approaches presented in the literature for analyzing repeated-measures data that overcome the limitations mentioned (Burton et al., 1998; Wu, 1995), namely, mixed models and marginal models. Advances in statistical theory and software development have made the use of these two approaches accessible and readily available (Burton et al., 1998).
Mixed models include many types of specifications and labels and are referred to also as generalized linear mixed models, general linear mixed models, linear mixed-effects models, or subject-specific models. The labeling of such statistical models is varied and inconsistent in the biomedical literature (Diez Roux, 2002). Descriptions of several labels associated with mixed models are provided following an overview of mixed models and their relationship to general linear models and generalized linear models.
The name mixed models refers to the mixing of fixed and random effects. An effect is termed fixed if all possible levels of the variable are represented in the data collected, or at least for which inferences will be made (Littell, Milliken, Stroup, Wolfinger, & Schabenberger, 2006). Examples of fixed effects include treatment group in a clinical trial with two arms, gender, race, and employment status. Random effects refer to variables for which the levels in a study represent a sample from a population of values. An approach for treating an effect as fixed or random is to decide if the levels described in a study represent a larger population with a probability distribution (Littell et al., 2006). Examples of random effects are hospital in a multicenter clinical trial, litter in an animal laboratory study, and school in a study of an after-school violence-prevention program (Cho, 2003). In each case, the levels of the effect represent a sample from a larger population of hospitals, litters, or schools.
Fixed effects are used in a statistical model to make inferences about a population mean of interest based on sample data. The estimate of the fixed effect is usually in the form of a parameter estimate, such as a regression coefficient in a linear regression model. Fixed effects contribute to understanding of the mean of the dependent variable. In contrast, random effects are used in a statistical model to account for the correct variance sources and structure. Interest in a study is usually in the true, or population, mean values of an outcome of interest. Random effects contribute to understanding of the variance of the dependent variable.
Statistical inferences about the population mean are made usually with confidence intervals or hypothesis tests. For the inferences to be valid and correct, the variance structure of the data needs to be accounted for properly. For example, repeated measurements taken on an individual leads to two sources of variability; within-subject variance is the variance on the multiple measurements for an individual, and between-subject variance is the variance between subject measurements. Random effects are used to specify this structure.
Two general labels are given in the literature to describe the different types of dependent variables that can be handled with linear statistical models. General linear models assume a continuous and normally distributed dependent variable. Generalized linear models refer to statistical models that are essentially linear models, although the dependent variables are instead expressed as functions of different types of responses. These models accommodate nonnormal response variables (McCullagh & Nelder, 1989). For example, logistic regression, Poisson regression, and Gamma regression are all types of generalized linear models.
General Linear Mixed Model
The specific type of mixed model used depends on the distributional assumptions made on the dependent variable and the type of data correlation structure. General linear models refer to classic linear statistical models that assume a normally distributed, single continuous response variable measured on each study subject; the most commonly used types of general linear models are linear regression and ANOVA. The general linear model includes only fixed effects and is used to make inferences about the population mean of a response variable of interest. The variability of the dependent variable is assumed to be explained partially by the fixed effects and explained partially by random variation from unexplained sources. If there are repeated-measures data, the assumption of random variation is violated, because we then know that some of the variance in the dependent variable is due to the repeated measurements on an individual. The random variation assumed in a statistical model may be due to sampling variability (the natural variation that occurs in taking a sample from a larger population of interest), measurement error, and other unmeasured possible phenomena, often including environmental or genetic factors.
The general linear mixed model is an extension of the general linear model to account for within-subject correlation. It is essentially a general linear model, with the addition of random effects to account for the two sources of variability, within- and between-subject variance. It adds terms to the model that account for repeated measurements on an individual.
Generalized Linear Mixed Model
Attempts to measure and quantify phenomena often lead to data types that are not continuous and normally distributed. For example, an outcome in a cardiovascular risk study may be hypertensive status, an indicator of diabetes, or a patient’s number of medications. Generalized linear models allow for modeling of many types of response variables. The structure of the model is similar to the previously discussed general linear model. However, the response variable is described as a function of the measured outcome. In the case of a dichotomous outcome, such as presence or absence of a condition, logistic regression statistical modeling is used to model a function of the probability of presence of the condition. Logistic regression is one type of generalized linear model. In the event of count data, such as the number of medications, the information may be reflected accurately with an assumption of a Poisson distribution and the use of Poisson regression. Poisson regression is another type of generalized linear model. The reader is referred to the landmark textbook on this topic by McCullagh and Nelder (1989). The generalized linear mixed model builds on this by adding random effects to the model. This allows for partitioning the variance of the outcome measure to account for within-subject correlation and other data structures, such as multilevel or hierarchical data that result from such a study design.
Many analysis techniques for repeated-measures data can be written in the general linear mixed-model or generalized linear mixed-model framework. Some of the labels given to common repeated-measures data and analysis techniques are displayed in Table 1. All of the labels are known to be types of data or analysis techniques that fit in the context of a mixed-modeling framework. The labels are not mutually exclusive; for example, a single model could be described as a multilevel, hierarchical, and random-effects model.
Multilevel data are a common data type collected in a clinical setting. For example, information from patients in a hospital setting may include many levels, such as hospital wing, a floor within the wing, and a patient within the floor. It is plausible to expect information collected by the same clinician, or patients from the same geographical region, to share a commonality. In other words, it may be desirable to account for possible within-clinician or within-geographical-region correlation. This is similar to the within-subject correlation previously described in this article. Multilevel statistical models, also called hierarchical models, provide a framework that accounts for the multilevel structure of data collected in a study. Multilevel or hierarchical models are one type of mixed models. A detailed account of multilevel and hierarchical models is provided by Rabe-Hesketh and Skrondal (2005).
Longitudinal data are data collected over time. The simplest, and the most commonly occurring, example in the biomedical research literature is pretreatment and posttreatment data. It is common to measure participants on a variable of interest before and after an intervention or treatment is administered. This allows for assessment of change. In cases where multiple measurements on the same individual are taken during a study period, these measurements should be correlated. A general linear mixed model is a suitable method for analyzing this type of data. It is common for time to appear in a model as a fixed effect and sometimes as a random effect. In the case of an experimental study with two treatment groups, the interaction effect between group and time (Group × Time) describes the comparative change in the response of interest for the two groups.
Also termed a variance components model, a random effects model refers to a special type of mixed model where the focus is on analyzing the variance of the dependent variable rather than only accounting for it and focusing on the mean. The variance of the response is partitioned into the within- and between-subjects components. With hierarchical or multilevel data, the variance may be structured to have several levels and terms; for example, in a multicenter (hospital) example, it is possible to partition the variance and estimate the within-patient variance, within-hospital-unit variance, and within-hospital variance. The proportion of total variance due to the between-subject variance is quantified using the intraclass correlation coefficient (Rabe-Hesketh & Skrondal, 2005).
Growth curve models are used to describe how measurements taken on the same individual change as a function of time. These types of models were developed and published in the biology literature and have been applied more recently in the nursing, sociology, and psychology fields (Cherlin, Chase-Lansdale, & McRae, 1998; Lauritsen, 1998). Because a growth curve describes multiple measurements taken on the same individual over a period of time, a random effect should be included in the statistical model to account for the within-subject correlation inherent with this type of data.
Clustered data is another term often used to describe data collected on the same unit, group, or family. A clustered analysis essentially means accounting for the within-group correlation inherent on data collected from the same unit, group, or family. The random effect component of a mixed model describes a separate within-subject correlation for repeated measurements on individuals. The language can be adjusted to describe within-unit, within-group, or within-family correlation and then can refer to these sources of common data as clusters. In other words, data coming from a unit may be referred to as data clustered within a unit. The mixed-model framework described is often implemented with a description of a clustered random effect or analysis.
Magnetic resonance imaging scans yield graphical images of body structures. For example, a single pictorial slice of a scan of the brain may yield a 30,000 × 30,000 pixel image; different parts of the brain may be described as clusters or units. The analysis of this type of data is often referred to as spatial analysis. In principle, the concept is the same as with clustered data or, more generally, repeated measures on individual data. In another application, geographical region may be studied spatially in relation to an environmental exposure. This translates to a random effect in the mixed-model framework to account for within-region correlation. The relationship between regions and measures of spatial proximity may also be incorporated into the mixed-model specification.
Measurements taken from the same person are more likely to be similar than measurements taken from different persons. In a sense, a subject serves as his or her own control with repeated-measures data because many variables are held constant. For example, in the context of a study with longitudinal data taken over time, many factors related to genetic, environmental, and social conditions remain the same within an individual’s circumstances but differ between individuals in a study. Typically, these factors are not measured but contribute to the unobserved and unmeasured variability that may influence the dependent variable of interest. The factors contributing to the variance of a random effect are sources also termed unobserved heterogeneity. Random effects describe unobserved phenomena, account for unobserved heterogeneity, and are sometimes referred to in the literature as latent variable models (Rabe-Hesketh & Skrondal, 2005).
Marginal models are referred to also as population average models or covariance pattern models. Another term often used to describe marginal models is generalized estimating equations. These are an approach to fitting marginal models (Zeger, Kung-Yee, & Albert, 1988). The term marginal is used to describe the modeling of the dependent variable across the population, which essentially means that the values are averaged over the subject-specific (random) effects. Although marginal models account for the correlation between measurements taken from the same individual, cluster, or unit, detailed analyses of the sources of variation and correlation in the data are not investigated (Diez Roux, 2002). To learn more about marginal models, see Diggle et al. (2002).
The vast literature in the statistics field on mixed and marginal models suggests the use of marginal models when analyzing repeated-measures binary response data. In contrast, when analyzing a continuous normally distributed response variable, a mixed model is the method of choice (Gardiner, Luo, & Roman, 2009; Twisk, Smidt, & de Vente, 2005). Mixed and marginal models yield essentially equivalent results in the case of continuous normally distributed response data. Given the added benefits of obtaining subject-specific effects with a mixed model, this method is recommended with this type of dependent variable.
A framework has been presented for analyzing the many types of correlated data collected in a research study with the use of statistical mixed models. Modern statistical software and computers accommodate the use of mixed and marginal models, making use of all data, and allow for studying covariates and predictors of interest while accounting for the correlation data structure. Baseline values and differences, whether statistically significant or not, are accounted for. Standard error estimates are correct, and confidence intervals and inferences are accounted for with a correlated data structure. Parameter estimates from these models are interpretable in a similar manner as regression coefficients from a linear regression. In addition, missing data are a reality in research; mixed and marginal models allow for using all available data on each subject.
Analyzing data with a statistical model warrants careful thought and consideration. Other aspects of a statistical analysis using statistical models should be considered carefully, such as model assumptions, estimation, prediction, diagnostics, and inference. Assumptions and model fit statistics should be reported when presenting results from a mixed model or marginal model analysis. An important assumption underlying each of the model specifications described previously is linearity. For settings where linear models are not appropriate, extensions to nonlinear settings have been developed (see Chapter 5 of Fitzmaurice, Davidian, Verbeke, & Molenberghs, 2009, for an introduction).
Historical approaches to analyzing repeated-measures data either neglect to use all of the data or ignore within-subject correlation. Ignoring the within-subject correlation on repeated measurements on an individual will yield analysis results with an incorrect standard error estimate. As a result, confidence intervals and inferences made with such results will be incorrect. An outdated method for analyzing this type of data, ANOVA with repeated measurements, has many limitations that are addressed with mixed models. Statistical software packages such as SAS, Stata, R, and SPSS have made the use and implementation of mixed models accessible and readily available. A plethora of books and articles have been written on the theory and application of mixed and marginal models, and authors in recent years have described applications of these advanced statistical techniques. The generalized linear model framework accommodates modeling of many types of dependent variables. It is natural to collect more than one measurement on an individual in a study. With the flexibility of the statistical models described here, these methods will lead to more realistic modeling of real-world phenomena and serve to strengthen understanding of data collected in research studies.
The authors wish to thank an anonymous reviewer for helpful comments and suggestions.