Structural equation modeling (SEM) is a general data analytic method for the assessment of models that specify relationships among variables. SEM involves investigating two primary models: the measurement model that delineates the relationships between observed measures and unobserved underlying factors and the structural model that defines the relationships between underlying factors. In this paper, we argue that psychosomatic medicine researchers frequently hypothesize complex relationships within and between measures and constructs (specified as factors in SEM) and, thus, should more frequently apply SEM. To encourage its use, we provide an intuitive presentation of the basic concepts of SEM: model specification, model estimation, and assessment of fit between the specified model and the data. We introduce these concepts within the framework of confirmatory factor analysis (CFA), which restricts analyses to those used to evaluate measurement models. We focus our presentation on CFA for a number of reasons: a) CFA is a method that psychosomatic medicine researchers are likely to apply; b) discussion of basic concepts based on a broader set of SEM methods is likely to make our presentation overly complex, particularly given page space limitations; and c) CFA is not limiting in the sense that it allows us to discuss the basic SEM concepts without loss of generality. Where possible, we consider how to expand applications beyond CFA to a broader range of SEM applications.
SEM in Psychosomatic and Medical Research
Although SEM is used frequently in some fields, such as psychology, education, sociology, and genetics, research using SEM appears comparatively infrequently in psychosomatic and medical journals. There is at least a small irony to the scarcity of SEM in psychosomatic and medical research in that the technique actually has its direct roots in biology. In the 1920s, geneticist Sewall Wright (1) first developed path analysis, a special case of SEM, in an attempt to better understand the complex relationships among variables that might determine the birth weight of guinea pig offspring.
Lately SEM has begun to appear more often in psychosomatic research. For example, Rosen et al. (2) applied SEM to estimate the association of global subjective health with psychological distress, social support, and physical function. In their analyses, these four constructs were specified as factors, which were imperfectly measured by their observed measures. SEM has even made an occasional foray into high-profile medical journals. For example, in a paper published in the New England Journal of Medicine in 2008, Calis et al. (3) used a path model to estimate a set of complex associations among malaria, human immunodeficiency virus, and various nutritional deficiencies. In a recent commentary on posttraumatic stress syndrome that appeared in JAMA, Bell and Orcutt (4) explicitly pointed out the potential utility of SEM in their area of study: “Structural equation modeling is particularly well suited for examining complex associations between multiple constructs; such constructs are often represented as latent constructs and are assumed to be free of measurement error.” Despite the applicability of SEM to assess research hypotheses posed by researchers in at least some areas of medicine, very few papers have graced the pages of JAMA using SEM to address any topic.
Even fewer papers in medicine and psychosomatic medicine focus exclusively on the CFA component of SEM. This is largely because the majority of research questions posed by researchers in these fields involve evaluating the relationship between multiple predictors and one or more outcome variables. CFA, by itself, does not address this type of question, but rather it assesses the quality and nature of the variables under study from a measurement perspective. CFA, however, can be used effectively as a preliminary step to evaluate the measurement properties of predictor and outcome measures—before conducting a larger structural model that includes not only the relationships defined by the measurement model but also the relationship between predictors and outcome variables.
A Pedagogical Example
We created a research scenario to illustrate various aspects of CFA. The scenario bears similarity to the one studied by Rosen et al. (2), although ours is much simpler so that the focus is on understanding CFA rather than the complexities associated with a real life study. We draw on the literature suggesting that a variety of psychosocial constructs, including trait hostility, anger, anxiety, and depressive symptoms, seem to be risk factors for the development of coronary artery disease (CAD) (5). Despite the large number of papers published on this topic, fundamental questions remain. For example, do depression, hostility, anger, and anxiety each uniquely pose a risk for heart disease, or is it simply a more general “negative affect” factor that poses the risk?1 To help us answer this question, we first need to assess whether the relationships among measures of depression, hostility, anger, and anxiety allow us to interpret these measures as manifestations of a general, negative affect factor. Such an analysis might support the general factor conjecture but also might indicate that some measures are better than others in assessing it. Alternatively, we might discover that there is more than one underlying dimension or even that the measures are too distinct to be jointly related to underlying factors. CFA is ideally suited to address these types of questions.
In the ensuing pages, we attempt to convey an understanding of CFA through our example, which focuses on the dimensions underlying scales assessing depression, hostility, anger, and anxiety. We use this example in that we believe that studying the measurement properties of these scales in the context of substantive theory is a critical step in our understanding of the usefulness of these scales in the prediction of CAD and should allow researchers to better design studies that include health outcome measures.
Purpose of CFA
CFA, as well as the more familiar exploratory factor analysis (EFA), defines factors that account for covariability or shared variance among measured variables and ignores the variance that is unique to each of the measures. Broadly speaking, either can be a useful technique for a) understanding the structure underlying a set of measures; b) reducing redundancy among a set of measured variables by representing them with a fewer number of factors; and c) exploiting redundancy and, in so doing, improving the reliability and validity of measures. However, the purposes of EFA and CFA, and accordingly the methods associated with them, are different. The goal of EFA is to discover a set of as-yet-unknown factors based on the data, although a priori hypotheses based on the literature may help guide some decisions in the EFA process. In other words, EFA may be conceptualized as primarily an inductive, data-driven method to obtain factors. In contrast, in CFA, we start with an explicit hypothesis about the number of factors underlying measures and the parameters of the model, such as the effects of the factors on the measures (i.e., weights or loadings). In practice, researchers impose constraints on a factor model based on a priori hypotheses about measures. In imposing constraints, we are forcing the model to be consistent with our substantive theory or beliefs. For example, if a measure was not designed to assess factor A (but rather to assess factor B), we force the weight between this measure and factor A to be equal to zero. We will discuss more about how and why we do this soon. If these constraints are inconsistent with the data and, more specifically, the pattern of relationships among measured variables, the model with its imposed constraints is rejected. If the constraints are consistent with the data, the estimated parameters of a model are interpreted in the context of the substantive area. Given the focus of CFA is on hypothesized models, let’s first describe how these models are specified before considering how the parameters of models are estimated and how the fit of models to data are assessed.
With CFA, we hypothesize a model that specifies the relationship between measured variables and presumed underlying factors.2 The model includes parameters (e.g., factor loadings) we want to estimate based on the data (i.e., freely estimated parameters) and parameters we constrain to particular values based on our understanding of our data and the literature (i.e., constrained or fixed parameters). The constraints on model parameters are what produce lack of fit to the data, which in turn informs us about how well our hypothesized model is supported.
In this section, we will consider three prototypical CFA models, each with a different substantive interpretation. We will present each prototypical model and discuss it in the context of our negative affect example. The example for the first two prototypes involves postulating a factor structure underlying four measures: scales of hostility, anger, anxiety, and depressive symptoms. Scores for each of the scales are computed by summing the items on a self-report instrument. For the third prototypical model, we extend this example by having three rather than one measure of depressive symptoms: affective, somatic, and cognitive.
The single-factor model is the simplest CFA model. Nevertheless, we devote considerable attention to it in order to introduce the basic concept of CFA and conventional SEM terminology. A one-factor model specifies a single dimension underlying a set of measures and, thus, provides a parsimonious explanation for the responses on these measures. As with any structural equation model, a single-factor model can be presented pictorially as a path diagram or in equation form. Figure 1 is a graphical representation of a model with a single factor (F1) underlying four measures: hostility (X1); anger (X2); anxiety (X3); and depressive symptoms (X4). By convention, the factor is depicted as a circle, which represents a latent variable, whereas the observed measures are squares, which represent observable or indicator variables. A single-headed arrow between two variables indicates the direction of the effect of the one variable on the other. Within the context of our example, we are postulating that a factor called negative affect (F1) underlies or determines the observed scores on the hostility, anger, anxiety, and depressive symptom measures (as well as error). Statistically, we believe these four measures are correlated because they have a common underlying factor, negative affect. In other words, the model reflects the belief that changes in the unobserved latent variable, negative affect, are presumed to result in changes in the four variables that we have measured.
Continuing with Figure 1, a variable with arrows pointing only away from it is called exogenous. A variable with one or more arrows pointing to it, even if one or more arrows are pointing away from it, is called endogenous. An equation is associated with each endogenous variable. Accordingly, the model in Figure 1 involves four endogenous variables and therefore four equations:
X1 = λ11F1 + E1
X2 = λ21F1 + E2
X3 = λ31F1 + E3
X4 = λ41F1 + E4.
The lambdas (λ) in these equations are factor weights or loadings, which can be interpreted essentially like regression coefficients. For example, for every 1-unit increase in the negative affect factor, F1, the expected change in hostility, X1, will be λ11.
Observed measures are not likely to be pure indicators of a factor but almost certainly contain unique components, frequently referred to as errors. A unique component for a measure includes reliable information that is specific to that measure—that is, unrelated to the factor—and measurement error. Because errors are not directly observable, they are also latent variables and are represented in our path diagram as circles. For the hostility measure in our example, the unique component might include the specific component of agitation as well as measurement error due to inattentiveness of respondents and ambiguity of the items on this measure.
Finally, our path diagram also includes double-headed curved arrows. If an arrow begins and returns to the same exogenous variable, it represents the variance of that variable. A double-headed arrow drawn between any two exogenous variables represents a covariance between them. In Figure 1, we could have drawn a double-headed arrow between any two errors, but we chose not to include error covariances in our model to avoid unnecessary complexity.
The model parameters, which we seek to estimate or constrain based on our understanding of a study, are associated with the single-headed and double-headed arrows in our diagrams and, by convention, are shown as Greek letters. In addition to the lambdas, the parameters for the model in Figure 1 are the variance of the factor (ςF12) and the variances of the errors (ςE12 − ςE42). As shown at the bottom of the figure, the model parameters can also be presented in three matrices: the phi matrix (Φ) containing the variances and covariances among factors, the lambda matrix (Λ) that includes all factor weights, and the theta matrix (θ) that includes the variances and covariances among the errors.
When we specify a model, we must stipulate whether parameters are “free” or “constrained.” Free parameters are estimated based on data. We are familiar with free parameters in that we routinely interpret them when conducting various types of statistical analyses, such as predictor weights in regression analysis or factor loadings in exploratory factor analysis. On the other hand, a constrained or fixed parameter is not estimated but rather is restricted by researchers to be equal to a specific value or to the value of another parameter. A common constraint is to fix the value of a parameter to zero and, in so doing, to indicate that it is unnecessary. For example, for our single factor model (Fig. 1), we could fix the loading for anxiety to be equal to 0 if we wanted to assess the hypothesis that the negative affect factor underlying the other three measures does not influence anxiety.
As shown in Figure 1, we have imposed a number of constraints on our single-factor model, although almost all the constraints are represented by what is not shown rather than what is shown in the figure. For example, our model includes no covariances among errors (i.e., all zeros in the off-diagonal positions of the theta matrix) in that there are no double-headed arrows between errors. Substantively, these constraints on the error covariances reflect the belief that only a single factor is necessary to explain the covariability among the measures. In addition, the variance associated with a second factor and its factor loadings have been implicitly fixed to zero, implying that an additional factor is unnecessary. If one or more of these constraints are incorrect, the model should fit the data poorly and should be rejected.
A second type of constraint—the metric constraint—must be imposed to define the metric or units of latent variables. The metric constraint is often a bit mysterious to SEM novices. We will take an intuitive approach to understanding this type of constraint. The metric of a factor is arbitrary because it is a latent variable with no inherent metric or scale; that is, it could assume any one of a number of alternative metrics. For example, it is arbitrary whether a factor representing length is measured in inches, feet, or meters. We typically assign a metric for a factor by fixing either its variance to 1, as we did in the one factor model, or one of its weights to 1. Fixing the variance of a factor to 1, essentially defines the units of the factor to be in Z score (conventional standardized) units. If researchers choose to define the metric by fixing the weight of a measured indicator variable to 1, they should choose the weight associated with the measured variable that is believed to have the strongest relationship with the underlying factor to avoid empirical identification issues. In our negative affect example, we might choose to fix the weight of the best developed depressive symptom measure to 1. The metric of the factor is then in the same metric as the measure with the loading of 1 (in our example, the depressive symptom scale). It is important to know that factor metric constraints in CFA models have no affect on the fit of the model to the data, but all other model constraints can result in a decrease in model fit to the extent that they are not supported by the data.
Once the model is specified by indicating what parameters are free to be estimated and which are fixed, the free parameters are estimated based on the data. If our one-factor model provides adequate fit (which we will define later), we could conclude that the data are consistent with the hypothesis that a single latent variable underlies the four observed measures, although we cannot rule out other models also producing good fit. We would also examine the standardized weights (which are analogous to standardized weights in regression analysis) to assess whether the measures are strong indicators of the factor and whether some measures are better indicators than others. A good fitting model would offer some support for using a single negative affect latent variable as a predictor (or as an outcome or mediator) in a more extensive structural model. If the model fails to fit, we should not interpret the estimated parameters because their values are likely to be inaccurate due to model misspecification. Instead, we would assess alternative models, such as the correlated factors model, to understand whether they might fit the observed data better.
Correlated Factors Model
Our second model is a correlated factors model, which specifies that two or more factors underlie a set of measured variables and that these factors are correlated. For simplicity, we will consider a two-factor model, but our discussion is relevant to models with more than two factors.
In Figure 2, we present a model for our four measures but now with two correlated factors. As with our path diagram for a single-factor model, we have circles for latent variables (i.e., factors and errors), squares for measured variables, single-headed arrows for effects of one variable on another, double-headed curved arrows for variances of exogenous variables, and a double-headed curved arrow for the covariance between the two factors. Within the context of our negative affect example, we might speculate that the hostility and anger measures are related to one another due to the shared characteristic of outward-directed agitation and distinct, to some degree, from the anxiety and depressive symptom measures, which share the characteristic of self-directed negativity. In other words, the model should include a factor (F1) affecting the hostility and anger measures (X1 and X2), and another factor (F2) affecting the anxiety and depressive symptom measures (X3 and X4).
Model parameters are associated with all single-headed and double-headed arrows and are presented in matrix form at the bottom of Figure 2. Constraints can be imposed on the model parameters. As previously presented, we can define the metric for factors by constraining their variances to 1 or one of their weights to 1. In this instance, we arbitrarily chose to set the factor variance to 1 (i.e., ςF12 = 1 and ςF22 = 1).
All constraints besides those to determine the metric of factors can produce lack of fit and are evaluated in assessing the quality of a model. For example, the effects of factors on measures, as shown by arrows between factors and measures in the path diagram, can be represented as equations:
X1 = λ11F1 + 0 F2 + E1
X2 = λ21F1 + 0 F2 + E2
X3 = 0 F1 + λ32F2 + E3
X4 = 0 F1 + λ42F2 + E4
As shown, the equations indicate that a number of factor loadings are constrained to zero such that each measured variable is associated with one and only one factor. The specified structure is consistent with the idea of simple structure, an objective frequently felt to be desirable with EFA. In addition, a measure is less likely to be misinterpreted if it is a function of only one factor. Given the advantages of this structure, researchers frequently begin with specifying models that constrain factor loadings for a measure to be associated with one and only one factor. In other words, each measure has one weight that is freely estimated, and all other weights (potential cross-loadings) between that measure and other factors are constrained to 0.
Other parameters in our model that may be freely estimated or constrained are the covariance between the factors and the variances and covariances among errors.3 a) With CFA, we typically allow the factors to be correlated by freely estimating the covariances between factors. If we constrained all factor covariances to be equal to zero (which requires three or more measures per factor)3 and imposed constraints so that any one measure is a function of only one factor, we would be hypothesizing a model that requires correlations among measures associated with different factors to be equal to zero. This model is likely to conflict with reality and be rejected empirically. In addition, this model would be inconsistent with many psychological theories that suggest underlying correlated dimensions. The decision to allow for correlated factors is in stark contrast with practice in EFA, where researchers routinely choose varimax rotation resulting in orthogonal factors. However, in EFA, we can still obtain good fit to data in that all factor loadings are freely estimated (i.e., all measured variables are a function of all factors), permitting correlations among all measured variables. b) We usually think of our measured variables as being unreliable to some degree and, thus, must freely estimate the error variances. In most CFA models, we begin by constraining all covariance between errors to be 0. By imposing these constraints, we are implying that the correlations among measures are purely a function of the specified factors.
If this model fits our data, we would have a structure that is consistent with the data, but we could not rule out that other models might fit the data as well or even better. If this model fits, the standardized loadings are high, and the correlation between factors is not too high, we could argue that anger and hostility may represent an underlying construct that is relatively distinct from the construct underlying the anxiety and depression measure. We might interpret the anger and hostility factor as outward-directed agitation, whereas the depression and anxiety factor might be interpreted as self-directed negativity. Besides addressing measurement issues, the finding of two factors would indicate that we would want to include measures associated with both of these factors in studies predicting CAD. On the other hand, if neither the one- nor two-factor models fit, we should probably include measures assessing all four constructs in predicting CAD.
In practice, we would not only assess the fit of the two-factor model but also its relative fit to a single-factor model. This comparison would evaluate whether the increase in model complexity associated with a two-factor model is warranted. This additional step is necessary in that a good-fitting model does not necessarily imply a correct model. In our example, to the extent that the factors are highly correlated, we would expect the fit of the one-factor model to be similar to the fit of the two-factor model. SEM procedures, including CFA, are at their scientific best when there are several theoretically plausible models available to compare. We will discuss fit later in this article. For now, we turn to one more type of model structure, just to further illustrate the kinds of models that can be represented.
A bifactor model may include a general factor associated with all measures and one or more group factors associated with a limited number of measures (7,8). In Figure 3, we present a bifactor model for six measures with one general factor and one group factor. The six measures include hostility, anger, and anxiety scales (X1, X2, and X3, respectively) and three scales of depression that distinguish among affective, cognitive, and somatic symptoms (X4, X5, and X6, respectively). Due to space limitations, we will only briefly describe the specification of this model.
As typically applied, we are unlikely to obtain a bifactor model with EFA in that an objective of this method (with rotation) is to obtain simple structure, which is generally intolerant to a general factor. In contrast, in CFA, we choose which parameters to estimate freely and which to constrain to 0. Thus, we can simultaneously allow for a general factor as well as group factors. Bifactor models have been suggested as appropriate for item measures associated with psychological scales (9). Although measures are likely to assess a general trait or factor, they are also likely to include more specific aspects of that trait, that is, group factors. In contrast with the previous model, the group factors for a bifactor model are typically specified to be uncorrelated (i.e., the factor covariances are constrained to 0). In our example, this model sug—gests that the three depressive symptom measures are to some extent distinct from the other three measures, but that a broader general factor, which might be called negative affect, also underlies all six measures.4
Although the results of the CFA models above will not have an immediate impact on the question of how these variables might be related to cardiac risk, they do inform our conceptual understanding and interpretation of the four measured variables. For example, if we were developing a prediction model with cardiac disease as an outcome, and the two-factor model turned out to be more consistent with the data in our CFA, we might choose to use those two factors as separate potential risk factors rather than just an overall negative affect variable. At this point in our exposition, we do not know how to determine whether a model is “good” or not, nor do we even know how the various unknown parameters in the models are actually estimated. We now turn to these more technical aspects of CFA.
Estimation of Free Parameters
Next, we consider how free parameters are estimated. We discuss estimation using the model presented in Figure 1 with four measures and a single factor. Hats (⁁) are placed on top of model parameters in recognition that we are estimating parameters based on sample data rather than considering them at the population level.
SEM software typically allows a variety of input data formats, including raw case-level data, sample variances and covariances among measures, or correlations and standard deviations among measures. Regardless of how you enter your data, the most popular method for estimation of CFA models (i.e., maximum likelihood estimation) involves fitting the variances and covariances among measures. In other words, based on your inputted data, the software creates a covariance matrix (denoted as S) that contains the variances and covariances of the measures, and the analysis is conducted on this covariance matrix. From this perspective, CFA treats this covariance matrix as the data.
The SEM software calculates values for the freely estimated parameters of a CFA model so that these estimated parameters are as consistent as possible with the data (i.e., the sample covariance matrix of the measures). More specifically, parameters are estimated so that a reproduced covariance matrix based on the estimated parameters (denoted Model and called the model-implied covariance matrix) is as similar as possible to the sample covariance matrix, S. Using our example with the four observed measures, the equation linking Model to the model estimated parameters is
The details of the equation are unimportant, but rather it is crucial to understand that the values of the estimated model parameters dictate the quantities in the reproduced covariance matrix among the measured variables. The objective of the estimation procedure is to have the variances and covariances based on the estimated model parameters (i.e., the values in Model) to be as close as possible to the variances and covariances among measures in our sample data (i.e., the values in S).
Stepping back from the technical details, we are assuming that some process exists that has generated the set of variances and covariances that we have observed among our four measures. In SEM, in general, we use substantive knowledge of the field to make specific hypotheses about what this process might be and then translate these hypotheses into a coherent model. To the extent that we have specified a model that approximates reality, the values in the covariance matrix implied by our model (and its parameters) ought to be similar, within sampling error, to the sample covariance matrix among the measures. In practice, constraints imposed on the model are likely to produce imperfect reproduction of S; that is, S ≠ Model. Next, we consider in greater detail how model parameters are estimated and then how the implied matrix is used to evaluate the fit of the model.
In contrast to regression analysis and many other statistical methods, equations are not available for directly computing the freely estimated parameters. The estimates are computed instead by an iterative process, initially making arbitrary guesses about the values of the model parameters and then repeatedly modifying these values in an attempt to have the values between S and Model be as similar as possible. The process stops when a prespecified criterion is met that suggests that the differences between S and Model cannot be smaller.
A very simple example with constructed data might be helpful at this point.5 Let’s say that the variances for the hostility, anger, anxiety, and depressive symptom measures are 1 (diagonal element of S, the sample covariance matrix among measures), and all covariances among measures are 0.36 (off-diagonal elements in S).
This set of values is highly improbable in the real world, but for convenience we have created a covariance matrix S as a correlation matrix. In specifying the model for these data, let’s say we fix the variance of our underlying factor to 1 to define its metric and estimate the loadings between the four measures and the factor, and the error variances for these measures. The software package (EQS, for our example) begins with very rough estimates of 1 for all factor loadings and all error variances. For these estimated parameters, the reproduced covariance matrix among the measured variables based on Equation 1 is:
The reproduced variances and covariances (on the right with 2s along the diagonal and 1s in the off-diagonal positions) are not very similar to the 1s and 0.36s in S. The software then takes another guess, revising its estimates so that the factor loadings are all 0.68 and the error variances are 0.64.
Now the values in Model are more similar to the values in S, but still not exactly the same. In the next iteration, all factor loadings are estimated to be 0.605, whereas the error variances are estimated to be 0.640. With two additional iterations, the final estimates are 0.60 for all factor loadings and 0.64 for all error variances.
For this artificial example, the model parameters reproduce perfectly the sample covariance matrix among the measures. In other words, the fit of the model to the data are perfect—a highly unlikely result in practice.
How does the algorithm know when S and Model are similar? Mathematically, it is necessary to specify a function to define the similarity. The most popular estimation approach is maximum likelihood; and with this approach, the iterative estimation procedure is designed to minimize the following function:
where p is the number of measured variables. It is not crucial to understand the details of the equation. What is important to know is that each iteration (set of parameter guesses) produces a value for FML, and that FML is a mathematical reflection of the difference between S and Model for a given set of estimated parameters. When FML is at its smallest value, S and Model are as similar as they can be, given the data and the hypothesized model. The values of the parameter estimates at this point in the iterative process are the maximum likelihood estimates for the CFA model.
For our example, FML becomes smaller through the first four iterations, as shown in Table 1. At step 5, there is no change in FML, and there are no changes in the values of the parameter estimates, so the process stops and the estimates at the last step are the maximum likelihood estimates.
Although researchers most frequently minimize FML to obtain estimates in SEM, it is sometimes preferable to choose other functions to minimize, especially when data diverge too far from multivariate normality or have missing values on some measures. For example, a different function—the full information maximum likelihood (FIML) function—is preferable if some data on measures are missing. On the other hand, a weighted least squares function is generally preferred for estimating model parameters when analyzing ordinal, item-level data (such as Likert-type items).6
To summarize our steps so far, we specify a model with free and constrained parameters and then estimate the parameters of the model in an iterative fashion so that reproduced covariance matrix (Model) based on the model is as similar as possible to the observed covariance matrix (S). The constraints imposed on the model are likely to produce differences between [Model] and S. We now turn to methods for assessing the fit of a model or, alternatively stated, the lack of fit due to the constraints imposed on a model’s parameters.
Assessment of Global Fit
We must assess the quality of a model by examining the output from SEM software to determine if the model and its estimated parameters are interpretable. We first scan the output for warning messages and rerun analyses when the model was improperly specified. Second, we assess local fit. Examples include evaluation of individual estimated parameters to ensure they are within mathematical bounds (e.g., no negative variances or correlations of >1.0), are within interpretational bounds (i.e., no parameter estimates with values that defy interpretation), and are significantly different from zero based on hypothesis tests. Third, we examine global fit to determine if the constrained parameters of a model allow for good reproduction of the sample covariance matrix among measures. We will concentrate our attention on global judgments of fit.
The fit function is used to determine the estimates of the model parameters. Given this function is deemed useful for computing parameter estimates, it is not surprising that this same fit function is also central in assessing global fit.
Testing the Hypothesis of Perfect Fit in the Population
We can assess the hypothesis that the researcher’s model is correct in the population. More specifically, we can ask whether the reproduced covariance matrix based on the model (ΣModel) is equal to the population covariance matrix among the measures (Σ). As shown in equation 3, the null hypothesis, H0, states the model-implied and population covariance matrices are equal, whereas the alternative hypothesis, HA, indicates that these two matrices are different:
Two comments are worth noting about how this question is posed in SEM. First, in most non-SEM applications of hypothesis testing, rejection of the null hypothesis implies support for the researcher’s hypothesis. In contrast, in SEM rejection of the null hypothesis indicates that the researcher’s hypothesized model does not hold in the population; that is, the model implied and population matrices are different. Second, no model is likely to fit perfectly in the population; and thus, we know a priori that the null hypothesis concerning the researcher’s hypothesis is false.
The test of the null hypothesis is straightforward. The test statistic, T, is a simple function of sample size (n) and the fit function:
(or T = nFml, as computed in some SEM software packages). In large samples and assuming the p measured variables are normally distributed in the population, T is distributed approximately as a χ2. The degrees of freedom for χ2 are equal to the number of unique variances and covariances in the covariance matrix among measured variables (i.e., [p(p + 1)]/2, where p is number of measured variables) minus the number of freely estimated model parameters (q), that is,
In most applications with some degree of model complexity, a sample size of ≥200 is recommended for T to be distributed approximately as a χ2. However, a greater sample size may be required to have sufficient power to reject hypotheses of interest, including hypotheses about a particular parameter or set of parameters.
Unfortunately, this test of global fit suffers from the same problems that a conventional hypothesis does. If the null is not rejected, it may be due to insufficient sample size, that is, a lack of power. In addition, nonrejection does not imply that the researcher’s model is correct—it is incorrect to “accept the null hypothesis.” It is likely that a number of alternative models produce similar T values. If the hypothesis is rejected, we can only conclude what we knew initially: The model is imperfect. If the sample size is large, the T value will necessarily be large, and even small and possibly unimportant discrepancies between the model implied and observed covariance matrix will yield significance. It is our observation that tests of models are routinely significant—meaning that we conclude our model does not fit—when sample size exceeds 200.
Fit Indices: Assessing Degree of Fit
Because the χ2 fit test is affected by sample size, a wide variety of other measures of fit have been proposed. Two indices that are used frequently are Bentler’s comparative fit index (CFI) and the root mean square error of approximation (RMSEA).
The CFI compares the fit of the researcher’s model with the fit of a null model. The null model is highly constrained and unrealistic. More specifically, the model parameters are constrained such that all covariances among measured variables are equal to zero (implying all correlations are equal to zero). Accordingly, we expect a researcher’s model to fit much better than a null model.
In the population, CFI is defined as:
λ is a noncentrality parameter that is an index of lack of fit of a model to a population covariance matrix. λ is zero if a model is correct and becomes larger to the degree that the model is misspecified. We would expect the null model to be a badly misspecified model in most applications of SEM; therefore, λnull model would be large. In comparison, λresearcher’s model should be much smaller. Accordingly, we expect to obtain high CFI values to the extent that the researcher’s model is superior to a null model.
In the formula for the CFIpop, we can substitute T − df for λ to obtain a sample estimate of CFI:
According to Hu and Bentler (12), a value of ≥0.95 indicates good fit. This cutoff VALUE is consistent with the belief that a researcher’s model should fit much better than the unrealistic null model. We emphasize here that cutoffs for fit indices are problematic; a preferable approach is to use these indices to compare fits for various alternative models.
The RMSEA is a fit index that assesses lack of fit of a model but not in comparison with any other model. Instead, it evaluates the absolute fit of a model (i.e., Tresearcher’s model/(n − 1)), taking into account how complex the model relative to the amount of data (i.e., dfresearcher’smodel). The sample estimate of the RMSEA is:
To the extent that the model fits [i.e., small Tresearcher’s model/(n − 1)] and the model involves estimating few model parameters (large dfresearcher’s model), RMSEA should approach zero. RMSEAs of <0.06 indicate good fit, according to Hu and Bentler (12), but again this cutoff should be treated as a rough-and-ready rule of thumb.
Underidentification and Other Problems in Estimation
We turn now to discuss one technical issue. A requirement for estimating the parameters of a model is that the model must be identified. Broadly speaking, underidentification is simply a problem of algebra, a matter of trying to estimate too many parameters given the data available. On the other hand, a model is identified if the information on your sample equals or exceeds the needs defined by the estimation of your model parameters. The information about the sample is captured by the unique variances and covariances in the covariance matrix of the observed measures. This information is used to estimate the free model parameters, which are the factor loadings, the variances, and covariances among the factors, and the variances and covariances among the errors. One formal rule of thumb to help assess identification is called the t rule, which states that the number of freely estimated parameters (q) must be less than or equal to the number of unique variances and covariances among the measured variables, which is equal to p(p + 1)/2. Another way to express the t rule is that the df for the χ2 test cannot be negative (Eq. 5). For our two-factor example (Fig. 2), the number of unique variances and covariance is p(p + 1)/2 = 4(4 + 1)/2 = 10. The number of free parameters is 9 (4 factor weights + 1 covariance between factors + 4 error variances). Therefore, this model passes the t rule for identification because 10 to 9 = 1.
The bad news is that even if your model passes the t rule, the model may still be underidentified (i.e., not identified). This occurs if the number of free parameters for a portion of the model exceeds the available sample information. For our two-factor model (Fig. 2), we might have chosen to constrain the covariance between the two factors to be equal to zero. This model passes the t rule in that there are ten unique variances and covariance and only eight freely estimated parameters (4 factor weights + 4 error variances), but it is not identified. The variances and covariances for each pair of measures associated with a factor are available to estimate only the model parameters for these measures. Because each pair of measures are linked to one and only one factor, it is as if two CFAs are being conducted—one for the pair of measures associated with one factor and another for the pair of measures associated with the second factor. The consequence is that the model cannot be estimated because the number of freely estimated parameters for any pair of measures (two loadings and two error variances) exceeds the amount of sample information (two variances and one covariance between these measures).
Additional identification rules are available. The three-indicator rule may be applied for the example just described. If each measured variable is associated with only one estimated factor loading (others constrained to 0), the covariances among factors are constrained to 0, and the covariances among the errors are constrained to 0, then a model is identified if each factor affects at least three measures (as opposed to two measures, as described for our two-factor model with uncorrelated factors). In most CFA applications, factors are allowed to be correlated, and then a model is identified if each factor affects at least two measures. In other words, for the two-indicator rule, the same conditions must hold as with the three-indicator rule except the covariances among factors are freely estimated.
There is both bad and good news about the use of the two- and three-indicator rules. The bad news is that they are not applicable for many CFA models. For example, they are not helpful in determining if a bifactor model with both group and general factors is identified. The good news is that available software is very likely to give warning messages if the model is underidentified. More bad news is that it is not always obvious what the warning messages mean and what, if anything, should be done to remedy the problem.
The messages might suggest other estimation problems, such as empirical underidentification or bad start values. With empirical underidentification, the model is identified mathematically, but nevertheless the parameters of the model cannot be estimated because of the data. For example, a CFA model with two factors might meet all the requirements of the two-indicator rule, but it may still not be able to be estimated if the freely estimated covariance between factors is 0 (or close to 0). In this case, because the factors are uncorrelated, three measures are required per factor. Alternatively, for the same example, if the estimated factor loading for a measure is 0, it cannot be counted as one of the indicators for a factor.
The other estimation problem is bad start values. The estimation process in CFA is iterative and requires start values that are created by the SEM software. With more complex models, the start values created by the program may be bad in that they do not produce adequate estimates. In this instance, the researcher may ask the program to conduct more iterations to get a good solution or may be forced to supply their own start values for parameter estimation. Researchers might use estimates from EFA or other CFA models to supply start values.
In conducting CFA, researchers are relieved when they receive no warning messages about parameter estimates. When they do receive messages, they should consult their SEM software manual to differentiate between messages that suggest minimal problems and those that require careful exploration of the model. Most importantly, it is essential not to deny the presence of warning messages, but rather to acknowledge their presence and, when in doubt, work through them with someone you trust (i.e., your local SEM expert).
In many applications, researchers who apply EFA could use CFA. To the extent that researchers have some knowledge about the measures that they are analyzing, they should be conducting CFA. There are real benefits to stating rigorously one’s beliefs about measures, ideally by specifying alternative models, assessing those beliefs with indices that allow for their disconfirmation, and at the end being able to specify which alternative model produces the best fit. It may require more thoughtfulness upfront than EFA, but the outcome is likely to be more informative if the methods of CFA are applied skillfully. We offer some suggested readings in an appendix to allow you to develop a better understanding of CFA and SEM in general.
As we noted at the beginning of this piece, we have considered only CFA in the article, one of many analytic procedures that can be conducted with SEM. In our example of hostility, anger, anxiety, and depressive symptoms, CFA was used to help understand the fundamental question of how those measures relate to one another—a worthy pursuit in and of itself but one that is often ignored in the psychosomatic literature. Typically, we would go a step further and use the factors we derive from CFA as predictors of cardiac disease risk, using the full structural model. In the full structural model, the paths between factors are assessed in addition to the relationship between factor and measures. A key advantage to this model is that the factors are free of measurement error, if correctly specified; thus, the paths between factors are essentially corrected for measurement error.
We include a list of suggested readings to allow researchers to learn about analyzing data, using a full structural equation model as well as to develop a more in-depth understanding of CFA and the various methods associated with it.