Secondary Logo

Journal Logo

Analytical methods and issues for symptom cluster research in oncology

Kim, Hee-Jua; Abraham, Ivob,c,d,e; Malone, Patrick S.f

Current Opinion in Supportive and Palliative Care: March 2013 - Volume 7 - Issue 1 - p 45–53
doi: 10.1097/SPC.0b013e32835bf28b
SYMPTOM CLUSTERS IN CANCER AND PALLIATIVE CARE: Edited by Andrea M. Barsevick and Aynur Aktas

Purpose of review Within a broader perspective on the next challenges in oncologic symptom cluster research, the objectives of this review are to examine the statistical methods that have been used to quantify and/or model the dynamic nature of symptom clustering, the methodological issues associated with those methods, and the statistical modeling techniques for the underlying mechanisms of symptom clustering.

Recent findings Correlation, factor analysis, principal component analysis, and cluster analysis are analytical methods to identify symptom clusters and/or to examine the influence of symptom clusters on patient outcomes. More recent techniques include latent variable methods, such as latent profile analysis, to examine the phenotypes of symptom cluster experience and growth modeling to examine the longitudinal nature of symptom cluster experience. Future endeavors include an investigation of the underlying mechanisms of symptom clustering using longitudinal data analysis. The methodological issues include the domain of the symptoms, measurement errors, stability of the solution within the data, measurement timing, and sample size.

Summary Each method has unique strengths and weaknesses, and the method choice should be driven by the aims and research questions of a given study.

aCatholic University of Korea, Seoul, Korea

bDepartment of Pharmacy Practice and Science, College of Pharmacy

cDepartment of Family and Community Medicine, College of Medicine

dCenter for Health Outcomes and PharmacoEconomic Research

eCenter on Aging, University of Arizona, Tucson, Arizona

fDepartment of Psychology, University of South Carolina, South Carolina, USA

Correspondence to Hee-Ju Kim, PhD, RN, OCN, 505 Banpo Dong, Seo-Cho-Gu, Catholic University of Korea, College of Nursing, Seoul, South Korea. Tel: +82 02 2258 7400; fax: +82 02 2258 7772; e-mail:

Back to Top | Article Outline


Symptom clusters are defined as stable groups of interrelated symptoms that occur together [1,2]. For the past decade, the main research questions in this area have been: what symptoms form a symptom cluster? And, do patients exhibit a symptom cluster? Consequently, studies have tried to identify or model symptom clusters using diverse statistical methods. The next challenge in the field of supportive and palliative cancer care is to validate the usefulness of identified or modeled symptom clusters and to elucidate the underlying dynamics of why some symptoms cluster. In turn, this requires researchers to evaluate the unique strengths, weaknesses, suitability, and proper application of an analytical method when designing a study and evaluating the utility of an identified symptom cluster.

Progressively, our next questions should be: why are symptoms clustering? And, what are the underlying mechanisms of symptom clustering? These questions require more complex and advanced modeling techniques.

Building on our prior review of statistical methods for symptom clustering in cancer [3], this article reviews the statistical methods that have been used to quantify and/or model the dynamic nature of symptom clustering, the methodological issues associated with those methods, and the statistical methods that can be used to understand the underlying mechanisms of symptom clustering. The focus is on symptom clusters in the oncology setting, but the concepts and methods can apply to other disease areas in which symptom clustering is relevant.

Box 1

Box 1

Back to Top | Article Outline


For this review, the database PubMed was searched to identify statistical methods used since 2001 for oncologic symptom clusters. One exemplar for each method is presented in Table 1. Exemplars were selected based on recentness and/or relevancy regarding this particular review. Other exemplars are cited in the text.

Table 1-a

Table 1-a

Table 1-b

Table 1-b

Back to Top | Article Outline

Correlations: simple correlations and complex modeling with correlations

Historically, the simple correlation method was the first method to model a symptom cluster and its consequences (see Table 1). Researchers selected a possible cluster clinically or theoretically, and provided the evidence for a cluster using correlation among symptoms [1]. Correlation examines the covariation of two symptoms with the correlation coefficient quantifying the direction (same or opposite) and strength of the covariation. Studies have reported low-to-moderate correlations between symptoms without justifying the strengths of the relationships to be a part of a cluster [1,12].

Partial correlation can be used to quantify the residual relationships between two symptoms after controlling for the other variable(s) in the dataset. By expanding the directional relationships of partial correlations, Beck et al.[4] tested a mediation model for a symptom cluster. They found that pain had an indirect effect on fatigue via sleep disturbance in addition to its direct effect. Correlation methods can provide the mathematical evidence of a selected cluster, but they have limitations when investigating cluster solutions in which symptoms have complex interrelationships [3].

Back to Top | Article Outline

Common factor analysis

In terms of empirical identification of symptom clusters, factor analysis has been used frequently in oncology [5▪,13,14]. Factor analysis is a class of statistical methods explaining observed associations among many variables by using unobserved variables, called factors. It assumes that the linear combinations of some unknown source variables create the observed variables [15]. Common factors are the common underlying sources for multiple observed variables and thus, induce correlations among variables [16]. By identifying common factors, the structures or dimensions that underlie a set of observed variables can be identified. Two or more symptoms that share one common factor are considered to form a symptom cluster.

Factor analysis is a helpful method when complex relationships among many symptoms may exist. Its limitations include the subjectivity in determining the final solution, the need for a large sample, and the need for specialized approaches for categorical data. In a recent study, theoretically modeled symptom clusters were confirmed by confirmatory factor analysis [17▪].

Back to Top | Article Outline

Principal component analysis

Principal component analysis (PCA) is often described as a type of factor analysis, however, it mathematically has nothing to do with unobserved variables. PCA searches for one or more components that best reproduce variances in a set of data [18]. It assumes that all variances of a set of variables can be summarized into components. A component is simply a combination of correlated variables whereas a common factor induces correlations among observed variables [16,19]. A study by Jimenez et al.[6▪] is an exemplar of oncologic symptom clustering using PCA and identified four symptom clusters in a large sample of patients with various types of advanced cancer.

PCA and factor analysis are superficially similar; in that, both approaches result in a smaller number of dimensions reflecting information from a larger pool of items (e.g., symptom indicators). The decision between using PCA or factor analysis should be guided by the purpose of the analysis, as well as theoretical considerations about the relations among the symptoms [18]. PCA is a data-reduction technique that involves no assumptions about the relations among the variables. It is of use when the goal of the analysis is to reduce a large number of variables to a smaller, more manageable number, perhaps in the context of a modest sample size. PCA is particularly useful when the symptom indicators are minimally correlated. Factor analysis, in contrast, is an explicit model of the relations among the indicators. It can be especially informative when there is a conceptualized common variable that might underlie several observed symptoms. Factor analysis is likely to yield poor results when indicators are not well correlated, as that suggests that no common factor exists. Practically, PCA and factor analysis can yield different symptom clusters in oncology due to the small number of symptoms and lower correlation levels among these symptoms [15,20]. In fact, PCA and factor analysis created different oncologic symptom clusters in the same sample [21▪▪].

Back to Top | Article Outline

Cluster analysis of symptoms and patients

Cluster analysis is a class of graphical and statistical classification methods used to group units into homogeneous subgroups based on relative similarities among the units on a set of attributes [22]. A subgroup is referred to as a cluster, which is a statistical term and should be differentiated from the conceptual term of symptom cluster. Clustering strives for mutual exclusiveness in its classification: a unit belongs to only one cluster. Although the similarity of units can be determined through various methods, standardized distance measures are now the standard and relay information on both the rank order and level of differences between units regarding the attributes of interest. Cluster analysis can categorize symptoms that occur in a similar pattern across patients and thus, can be used to identify symptom clusters. For example, Kirkova et al.[7] conducted cluster analysis with a large sample of advanced cancer patients and identified three symptom clusters stable over two samples (see Table 1). The main advantage of cluster analysis is that it enables the examination of a large number of symptoms with limited sample sizes. However, determining the final solution is subjective and the handling of missing data can be tedious.

Cluster analysis can also categorize patients based on their similarities regarding a set of attributes (e.g., symptoms). Note that empirically identified symptom clusters cannot be guaranteed to exist in patients. Cluster analysis of patients can be a method to identify patients who experience a particular cluster, that is, a phenotype of symptom experience. Several studies have examined whether cancer patients experience a theoretically chosen or empirically identified symptom cluster, and have investigated subgroups of patients with unique symptom cluster experience [8▪,23].

Back to Top | Article Outline

Latent class analysis/latent profile analysis

Latent class analysis (LCA) is a new technique for grouping patients in oncology. It is similar to cluster analysis in purpose but its use is limited to patient classification. LCA is a latent variable method and assumes there is a latent nominal variable categorizing patients into subgroups. The observed indicator variables display the nature of the subgroups [24]. A latent variable summarizes the correlation between indicator variables as a common factor in factor analysis. LCA uses an iterative process through the expectation maximization algorithm to find the best solution. It provides stronger model fit statistics than cluster analysis. The replication of a solution can be tested and thus, stability of the findings in the subsamples can be established. The method was originally developed for binary indicators but has been extended to quantitative variables, and this is referred to as latent profile analysis (LPA), which has been commonly used in social and behavioral science. LPA was used recently to examine the phenotypes of symptom cluster experience and its association with cytokine genes [9▪▪]. LCA and LPA require careful model construction and selection.

Back to Top | Article Outline

Structural equation modeling

Structural equation modeling enables researchers to identify subsets of measured variables that are assumed to collectively represent a higher order latent construct, such as a symptom cluster. It evaluates the extent to which a subset of measured variables represents a higher-order latent construct and examines the relationships of the latent variable to other variables of interest (either independent or outcome variables) [25]. By doing so, a complex symptom cluster can be identified and simultaneous evaluations of the relationships between multiple variables, mediators, and outcome variables can be performed.

Structural equation modeling requires the specification of a theoretical model (e.g., a higher-order latent construct), the variables used to operationalize these constructs, and the nature and direction of relationships among the variables. This technique was used to examine the temporal changes in symptoms before death [10].

Back to Top | Article Outline


Several issues need to be considered in order to establish a symptom cluster study's validity. First, the assumptions governing any statistical method should be considered when choosing a method. For example, factor analysis is preferred to PCA when identifying symptom clusters induced by common factors and conceptual interpretability is paramount [18]. Conversely, PCA is better if the goal is to identify clusters on the basis of the variance minimization principle and thus optimize the (mathematical) homogeneity of derived clusters. Studies indeed often ignore whether the statistical assumptions or conceptual framework for a particular symptom cluster is congruent with the selected analytical method. This shortcoming of not matching assumptions, frameworks, and analytics characterized many early symptom cluster studies. For instance, Jimenez et al.[6▪] chose PCA without explaining what kind of symptom cluster they were exploring and how PCA could better identify such clusters over factor analysis. Future studies should emanate more clearly from conceptual frameworks and match the conceptual dynamics with the analytical approaches.

Second, the domain of symptoms (i.e., the number and type of symptoms that are included in an analysis) should be considered. Earlier studies often used secondary data that, in the primary database, included a comprehensive symptom inventory. However, researchers should be cautious when selecting the symptom domain. At the core, this is a sampling issue – not of patients, but of the symptoms covered by inventories and other data collection methods. When the domain of symptoms is under-represented or over-represented, symptom clusters can be underidentified or overidentified [3]. The measurement time point is important, and studies should be timed with respect to the illness trajectory and the nature of the target population as these can change the domains of symptoms. When the same domain was used, even with different statistical methods, cluster analysis and PCA can yield the same symptom cluster [see the reference [26] for exemplar]. However, the use of different domains can yield different clusters even with the same method.

Third, the use of a single item from a comprehensive symptom inventory for a particular symptom can increase measurement error for some symptom constructs. On the contrary, inclusion of multiple items to measure the same symptom construct may bias findings due to relatively higher correlations between the items of the same construct. Thus, measures should be carefully selected.

Fourth, many multivariate statistical methods involve subjective decision-making about various steps in the analysis and in determining the final solution. Therefore, theoretically and practically sound criteria to reach final solutions should be established and, most importantly, researchers should specify and justify each and every one of them.

Fifth, the stability of a solution needs to be confirmed. The structural stability of the solution is the generalizability of the findings from the overall sample into subsamples. Often, the sample is heterogeneous in terms of disease stage, type of cancer, treatment, or any unmeasured characteristics. Analysis of heterogeneous data can lead to false findings. One study [13] examined the structural stability in a given sample. The stability over time is related to whether findings at one time point can be replicated at another time point. Two exemplar studies examined the stability of a symptom cluster over time [5▪,13]. The stability over time issue has been limited to qualitative evaluation. Longitudinal data analysis may enable researchers to better quantify the nature of symptom experience over time and permit quantitative evaluation of the stability over time.

Lastly, sample size has been recognized as an issue and recent studies are using larger samples. A small sample size threatens the external validity of study findings. Sample size calculations should be performed in consideration of the design and analytics used.

Back to Top | Article Outline


From the identification of symptom clusters, we are now moving toward a new area, the mechanisms underlying symptom clustering. All aforementioned methods apply to cross-sectional data. However, cross-sectional studies are limited in their ability to generate evidence about directional or causal relationships between symptoms and their possible underlying mechanisms. Longitudinal study designs and relevant statistical methods allow for the construction and testing of complex symptom cluster models with possible underlying mechanisms. Several statistical approaches for longitudinal study design are introduced but their use in symptom cluster study is very limited at this point.

Back to Top | Article Outline

Regression approach for longitudinal data

The regression framework can be used to analyze longitudinal data by incorporating an indicator of ‘time’ in the statistical model. Doing so must be done with care, as the dependence among observations can result in violations of the assumptions of conventional linear regression techniques, often biasing estimates of standard errors and therefore significance tests. Several technical solutions are available to handle these issues. One approach is to compute standard errors using approaches that relax these assumptions, such as ‘robust’ standard errors or generalized least squares estimation. Another method is the inclusion of a person-specific error term either as a random variable or a fixed constant in the model (i.e., random effect models or fixed effect models). In these models, time itself and lagged biological and/or psychological predictors of symptom cluster experiences can be included. By doing so, the causal influences of these predictors can be clarified, providing stronger evidence on the mechanism of symptom cluster experience. No symptom cluster studies used this regression approach.

Back to Top | Article Outline

Growth modeling, growth curve analysis, and growth mixture analysis

The growth modeling technique, also called growth curve analysis, can model changes in a ‘measured’ continuous or categorical-dependent variable over time using continuous ‘latent’ growth factors [27,28]. Growth modeling uses the structural equation modeling framework in which growth factors are treated as quantitative latent variables with random coefficients. Thus, coefficients for growth factors (i.e., the means and variances of slopes and intercepts in individual growth) are estimated as regression coefficients in regression analysis.

Growth mixture modeling is a variation of growth modeling where the data are expected to come from unobserved sub-populations and group memberships are unknown [29]. Growth curve analysis in oncology has been used to identify a group of symptoms changing over time in a similar pattern [11▪▪] and to examine the patterns of fatigue over time [30]. Growth modeling requires extensive prior knowledge to modeling the patterns and complicated procedures in order to find the final solution. Growth modeling can also include biological/psychological predictors and examine how these predictors are associated with symptom cluster experience over time. By doing so, growth modeling can provide stronger evidence for the causal relationships between symptoms and their predictors.

Back to Top | Article Outline

Latent transition analysis

Latent transition analysis (LTA) models change over time in a latent class variable, just as growth curves model changes over time in a continuous variable. LTA focuses on the ‘transition patterns’ of subject memberships in latent classes over two consecutive time points – e.g., membership in a given latent class at Time 1 of a longitudinal study is associated with certain probabilities of membership in the possible latent classes at Time 2 [31]. The changing patterns are quantified in a matrix of transition probabilities between two time points. Covariates, which can include a manipulated treatment, can be included to determine their influence on the transition matrix using multinomial logistic regression links. By doing so, it can provide evidence for the directional influence of covariates (e.g., possible underlying mechanisms) on outcome variables (e.g., symptom cluster experience).

In social and behavioral science, the developmental stages have been modeled with LTA [32]. Although LTA has never been used in oncology, it certainly has implications in this field. A particular strength of LTA is that the measurement of the latent class can differ on different occasions – for example, in an etiologic study, some symptom indicators may not be applicable at the earliest measurement occasions. Of note, transitions can only be quantified over two adjacent time points and thus, examining transitions over multiple time points can be tedious, though higher-order LTA techniques (e.g., classes of typical transition patterns) can mitigate this drawback.

In designing longitudinal studies, several issues should be considered. First, when changes in a single dependent variable are examined, a strategy for summarizing multiple symptoms within a cluster into a single variable should be determined. When scaling differs across symptoms, weights of each symptom will also differ when combining these symptoms. Using the same scales over multiple symptom measures should be carefully considered as simple standardization procedure can impose a variance restriction over time. Second, the selection of measurement timing, frequency, and time window should evolve from a theoretical framework that describes the changing patterns of a particular symptom cluster. Appropriate statistical techniques should be chosen based on this framework.

Back to Top | Article Outline


We reviewed several statistical approaches that have been used to identify and model symptom clusters in oncology: correlation method, factor analysis, PCA, cluster analysis of symptoms and subjects, LCA, and structural equation modeling. We also discussed several longitudinal data analysis techniques to examine the underlying mechanisms of symptom clusters. Some methods use only observed variables and some use latent variables to explain the observed phenomena. All latent variable methods have advantages regarding the accommodation of covariates and the consequences of symptom cluster experience; thus, these methods could be implemented for testing a complex symptom cluster experience theory.

Back to Top | Article Outline



Back to Top | Article Outline

Conflicts of interest

There is no conflict of interest.

This study was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF), funded by the Ministry of Education, Science and Technology (2012R1A1A1009672). I.A. was supported as Director of the Academic Fellowship Program in Clinical Outcomes and Comparative Effectiveness Research funded by the Bureau of Health Professions, US Department of Health and Human Services, through the Arizona Health Education Centers Program.

Back to Top | Article Outline


Papers of particular interest, published within the annual period of review, have been highlighted as:

  • ▪ of special interest
  • ▪▪ of outstanding interest

Additional references related to this topic can also be found in the Current World Literature section in this issue (pp. 119–120).

Back to Top | Article Outline


1. Dodd MJ, Miaskowski C, Paul SM. Symptom clusters and their effect on the functional status of patients with cancer. Oncol Nurs Forum 2001; 28:465–470.
2. Kim HJ, McGuire DB, Tulman L, Barsevick AM. Symptom clusters: concept analysis and clinical implications for cancer nursing. Cancer Nurs 2005; 28:270–282.
3. Kim HJ, Abraham IL. Statistical approaches to modeling symptom clusters in cancer patients. Cancer Nurs 2008; 31:E1–E10.
4. Beck SL, Dudley WN, Barsevick A. Pain, sleep disturbance, and fatigue in patients with cancer: using a mediation model to test a symptom cluster. Oncol Nurs Forum 2005; 32:542.
5▪. Skerman HM, Yates PM, Battistutta D. Cancer-related symptom clusters for symptom management in outpatients after commencing adjuvant chemotherapy, at 6 months, and 12 months. Support Care Cancer 2012; 20:95–105.

This is a recent study, which examined the stability of symptom clustering over time after chemotherapy.

6▪. Jimenez A, Madero R, Alonso A, et al. Symptom clusters in advanced cancer. J Pain Symptom Manage 2011; 42:24–31.

This is a recent exemplar, which identified symptom cluster in large number of advanced cancer patients using principal component analysis.

7. Kirkova J, Aktas A, Walsh D, et al. Consistency of symptom clusters in advanced cancer. Am J Hosp Palliat Care 2010; 27:342–346.
8▪. Kim HJ, Barsevick AM, Beck SL, Dudley W. Clinical subgroups of a psychoneurologic symptom cluster in women receiving treatment for breast cancer: a secondary analysis. Oncol Nurs Forum 2012; 39:E20–E30.

This study examined the phenotypes of symptom cluster experience across three time points using cluster analysis.

9▪▪. Illi J, Miaskowski C, Cooper B, et al. Association between pro- and anti-inflammatory cytokine genes and a symptom cluster of pain, fatigue, sleep disturbance, and depression. Cytokine 2012; 58:437–447.

This study examined the phenotypes of symptom cluster experience and its biological associates (i.e., cytokine genes) using LPA.

10. Hayduk L, Olson K, Quan H, et al. Temporal changes in the causal foundations of palliative care symptoms. Qual Life Res 2010; 19:299–306.
11▪▪. Wang XS, Fairclough DL, Liao Z, et al. Longitudinal study of the relationship between chemoradiation therapy for nonsmall-cell lung cancer and patient symptoms. J Clin Oncol 2006; 24:4485–4491.

This is the first study that identified symptom clusters based on similar trajectories over time.

12. Maliski SL, Kwan L, Elashoff D, Litwin MS. Symptom clusters related to treatment for prostate cancer. Oncol Nurs Forum 2008; 35:786–793.
13. Kim HJ, Barsevick AM, Tulman L, McDermott PA. Treatment-related symptom clusters in breast cancer: a secondary analysis. J Pain Symptom Manage 2008; 36:468–479.
14. Wang SY, Tsai CM, Chen BC, et al. Symptom clusters and relationships to symptom interference with daily life in Taiwanese lung cancer patients. J Pain Symptom Manage 2008; 35:258–266.
15. Gorsuch R. Factor analysis. 2nd ed. Hillsdale, New Jersey: Lawrence Eribaum Associates; 1983.
16. Kim J, Mueller CW. Introduction to factor analysis. Beverly Hills, Calif: Sage Publications; 1978.
17▪. Matthews EE, Schmiege SJ, Cook PF, Sousa KH. Breast cancer and symptom clusters during radiotherapy. Cancer Nurs 2012; 35:E1–E11.

This study is the most recent study which used a confirmatory factor analysis approach.

18. Kim H-J. Common factor analysis versus principal component analysis: choice for symptom cluster research. Asian Nurs Res 2008; 2:17–24.
19. Tabachnick BG, Hays BG. Using multivariate statistics. 4th ed. Allyn & Bacon: Needham Heights, Massachusetts, USA; 2001.
20. Fabrigar LRW, Duane T, MacCallum, Robert C, Strahan, Erin J. Evaluating the use of exploratory factor analysis in psychological research. Vol 4: Psychological Methods 1999; 4:272–299.
21▪▪. Chen E, Nguyen J, Khan L, et al. Symptom clusters in patients with advanced cancer: a reanalysis comparing different statistical methods. J Pain Symptom Manage 2012; 44:23–32.

This study demonstrated that different statistical methods (PCA versus factor analysis) can yield different symptom clusters.

22. Everitt B. Cluster Analysis. 3rd edNew York:Halsted press; 1993.
23. Pud D, Ben Ami S, Cooper BA, et al. The symptom experience of oncology outpatients has a different impact on quality-of-life outcomes. J Pain Symptom Manage 2008; 35:162–170.
24. Lanza ST, Collins LM, Lemmon DR, Schafer JL. PROC LCA: a SAS procedure for latent class analysis. Struct Equ Modeling 2007; 14:671–694.
25. Hoyle RH. Structural equation modeling: concepts, issues, and applications. Thousand Oaks, California, USA: Sage Publications, Inc; 1995.
26. Henoch I, Ploner A, Tishelman C. Increasing stringency in symptom cluster research: a methodological exploration of symptom clusters in patients with inoperable lung cancer. Oncol Nurs Forum 2009; 36:E282–E292.
27. Muthen B, Shedden K. Finite mixture modeling with mixture outcomes using the EM algorithm. Biometrics 1999; 55:463–469.
28. Muthén, B. Latent variable analysis: Growth mixture modeling and related techniques for longitudinal data. In Kaplan D, editor, Handbook of quantitative methodology for the social sciences (345–368). Newbury Park, California, USA: Sage Publications; 2004.
29. Hoeksma JB, Kelderman H. On growth curves and mixture models. Infant Child Dev 2006; 15:627–634.
30. Miaskowski C, Paul SM, Cooper BA, et al. Trajectories of fatigue in men with prostate cancer before, during, and after radiation therapy. J Pain Symptom Manage 2008; 35:632–643.
31. Lanza ST, Collins LM. A new SAS procedure for latent transition analysis: transitions in dating and sexual risk behavior. Dev Psychol 2008; 44:446–456.
32. Maldonado-Molina MM, Lanza ST. A framework to examine gateway relations in drug use: an application of latent transistion analysis. J Drug Issues 2010; 40:901–924.

methods; statistics; symptom clusters

© 2013 Lippincott Williams & Wilkins, Inc.