Advances in Conceptual and Methodological Issues in Symptom Cluster Research: A 20-Year Perspective : Advances in Nursing Science

Secondary Logo

Journal Logo

Original Articles

Advances in Conceptual and Methodological Issues in Symptom Cluster Research

A 20-Year Perspective

Harris, Carolyn S. BSN, RN; Dodd, Marylin PhD, RN; Kober, Kord M. PhD; Dhruva, Anand A. MD; Hammer, Marilyn J. PhD, RN; Conley, Yvette P. PhD; Miaskowski, Christine A. PhD, RN

Author Information
doi: 10.1097/ANS.0000000000000423


SYMPTOM SCIENCE was transformed by 2 landmark articles that suggested the existence of “symptom clusters” in oncology patients.1,2 Prior to these articles, symptom research focused primarily on an evaluation of the prevalence and severity of single symptoms in patients with chronic conditions.3 Building on the clinical reality that symptoms rarely occur alone, researchers and clinicians were challenged to evaluate for and manage co-occurring symptoms and/or symptom clusters.

Given that these 2 studies published in 2001 are credited with launching the field of “symptom cluster” research,1,2 they warrant careful evaluation 20 years later. In the first study,2 the relationships between pain and fatigue and the co-occurrence of 20 other symptoms were evaluated in a heterogeneous sample of newly diagnosed oncology patients over 1 year. In the second study,1 the effect of a prespecified symptom cluster (ie, pain, fatigue, sleep disturbance) on oncology patients' functional status was evaluated over 3 cycles of chemotherapy. Of note, in that article, the first definition of a symptom cluster was proposed to be “three or more concurrent symptoms” that “are related to each other.... The symptoms within a cluster are not required to share the same etiology.”1(p465)

While these studies provided a stimulus and new directions for symptom science research, several limitations warrant consideration. First, only 2 symptoms (ie, pain, fatigue) were evaluated in one study2 and 3 symptoms (ie, pain, fatigue, sleep insufficiency) in the other study.1 Second, in both studies, the symptom cluster was prespecified, not created “de novo.” Third, both studies evaluated for associations between single symptoms and a distal outcome, not with the “symptom cluster” as a whole.

While symptom cluster research has grown considerably since the publication of these 2 relatively “simplistic” studies,1,2 as noted in the most recent expert panel report,4 this field is relatively new and ongoing conceptual issues warrant consideration. One key question is a rather simple one, namely: “What constitutes symptom cluster research?” As noted by Miaskowski and colleagues5 in 2007, 2 conceptual approaches to evaluate symptom clusters evolved over a period of 5 years, namely, “clustering” symptoms (equates with a variable-centered analytic approach) and “clustering” patients (equates with a person-centered analytic approach) (Figure 1). The use of the word “clustering” for both approaches has led to confusion in the literature on symptom cluster research. For example, it is not uncommon to find publications that have described “symptom clusters” when patients were grouped based on an evaluation of a prespecified symptom cluster that consisted of 2 or more symptoms.6,7 Given this confusion, it is imperative to use the correct terminology as outlined in the following text.

Figure 1.:
Two conceptual approaches to symptom cluster research. (A) The identification of symptom clusters using a variable-centered approach. (B) The identification of subgroups of patients based on their experience with a prespecified symptom cluster (eg, pain, fatigue, sleep disturbance, depression). Adapted from Miaskowski et al.5 Reprinted with permission from the Journal of the National Cancer Institute Monographs. This figure is available in color online (

As noted in Figure 1A, variable-centered approaches (eg, exploratory factor analysis [EFA]) identify symptoms that cluster together empirically through the use of an analytic approach that creates distinct groups of related symptoms (ie, symptom clusters).5 These approaches are based on the hypothesis that symptoms cluster together because they may share a common underlying mechanism(s).8,9

Patient-centered approaches (Figure 1B; eg, latent class analysis [LCA]) identify subgroups of patients with distinct symptom profiles using 1 or more symptoms or a prespecified symptom cluster (eg, pain, fatigue, depression, sleep disturbance10). With these approaches, it is important to note that in the context of symptom cluster research, a symptom cluster must be prespecified. These patient-centered analyses can be used to identify subgroups of patients with distinct symptom(s) profiles (ie, lower vs higher symptom burden) and associated risk factors (eg, demographic, clinical, biomarkers).5

Previous reviews have evaluated the conceptual, methodological, and clinical basis for symptom cluster research.5,11–15 In a concept analysis that included a review of symptom cluster research across psychiatry, medicine, and nursing, Kim and colleagues14 identified 5 key attributes of a symptom cluster (eg, co-occurrence of symptoms within a cluster, stability, shared or common etiology). Based on research findings and clinical evidence, both Kim and colleagues14 and Aktas11 argued for the definition of a symptom cluster to be modified to include a minimum of 2 symptoms. Kim and Abraham13 and Skerman and colleagues15 examined the application of various statistical methods to identify symptom clusters and reviewed the conceptual and methodological challenges of each method. Building on a previous article by Miaskowski and colleagues5 that described the 2 conceptual approaches for symptom cluster research, Barsevick12 examined the application of qualitative approaches to symptom cluster research and expanded on the concept of stability in symptom cluster research.

In the most recent state-of-the-science report,4 an expert panel called both for the identification of symptom clusters using newer analytic techniques and for an investigation of the underlying mechanisms for symptom clusters. In addition, they suggested that additional research is warranted to clarify the “de novo” approach to the identification of symptom clusters versus the grouping of patients with distinct symptom cluster profiles based on a “prespecified” symptom cluster. Given the recent application of newer methods to symptom cluster research (eg, network analysis (NA),16 natural language processing [NLP]17), a review of the conceptual basis for these older and newer methods in the context of symptom cluster research is warranted. Therefore, the purposes of this article are to review the conceptual basis for symptom cluster research; compare and contrast the conceptual basis for using variable-centered versus patient-centered analytic approaches in symptom cluster research; review the strengths and weaknesses of the most common variable-centered and patient-centered analytic approaches for symptom cluster research; and compare the various applications of each approach in symptom cluster research.

Statements of Significance

What is known or assumed to be true about this topic?

Symptom cluster research began in 2001 based on the clinical reality that patients rarely report a single symptom. As research on symptom clusters grew over the past 20 years, 2 conceptual approaches emerged to evaluate symptom clusters; namely: “clustering” symptoms (equates with a variable-centered analytic approach) and “clustering” patients (equates with a person-centered analytic approach). While the use of both variable-centered and patient-centered analytic approaches is needed to move this area of scientific inquiry forward, evidence exists that these methods are not used consistently. Conceptual clarity on the use of these 2 methods in symptom cluster research is needed.

What this article adds:

This article provides conceptual clarity regarding the use of both analytic approaches in symptom cluster research. In addition, this article describes novel methods (ie, NA, Bayesian NA, NLP) that have emerged to facilitate our understanding of symptom clusters.


As the science of symptom cluster research has advanced over the past 20 years, the definition of a symptom cluster has gone through multiple revisions.1,12,14 In the most recent revision by an expert panel,4 several characteristics of both a symptom and a symptom cluster were identified (Table). While some debate continues on the minimum number of symptoms that constitutes a symptom cluster,11,12 a minimum of 2 symptoms in a cluster is generally accepted. However, clarification and/or refinement of the other characteristics are needed. For example, in terms of “stability,” neither the definition of nor the methods to assess stability exist. This issue is particularly important when one considers the temporal dimension of symptom clusters. Does stability refer to whether or not the various types of symptom clusters (eg, psychological, gastrointestinal) remain “stable” or whether or not the symptoms within each cluster (eg, sad, irritable, angry) remain “stable” over time? We propose that the term “stable” be used to describe whether the symptom clusters change over time and/or across symptom dimensions. Alternatively, the term “consistent” should be used to describe whether the specific symptoms within a cluster remain the same over time and/or across symptom dimensions. For both stability and consistency, the assessment methods and numeric criteria need to be determined.18

Table. - Areas of Ongoing Development in the Definition of a Symptom Clustera
Symptomb Symptom Cluster Same Characteristics as a Symptom—Plus: Exemplars of Areas for Future Research and Development:
Subjective perception Two or more concurrent symptoms Consensus is needed on the specific characteristics that encompass the definition of a symptom cluster within and across acute and chronic conditions
May vary over time Stable group of symptoms The definition of and criteria for stability and consistency need to be established and evaluated. In addition, the conditions or circumstances when symptom clusters may or may not be stable warrants additional research (eg, across symptom dimensions, within and across symptom dimensions over time)
Has antecedents Independent of other clusters The interrelationships between and among symptoms and symptom clusters warrant detailed evaluation
Influences outcomes May have shared underlying mechanism(s) How do the mechanisms that underlie single symptoms within a cluster differ from mechanisms that underlie the entire cluster?
May be influenced by an intervention May have shared outcome(s) Do symptom clusters influence patient outcomes similarly or differently?
Has an underlying mechanism Temporal dimension When and how do symptom clusters change over time?
aAdapted from Miaskowski et al.4 Reprinted with permission from Oxford University Press.
bSymptoms are subjective sensations. Signs are objective indications of some medical characteristics.

Equally important is the question of whether or not symptom clusters need to be independent of other clusters. Given the recent use of NA, that demonstrates that symptoms within one cluster are related to symptoms in other clusters,16 this criterion may need to be reconsidered. Equally important, research is needed to support the criteria that symptom clusters may share common underlying mechanisms and may have shared outcomes.


De novo identification of symptom clusters

Variable-centered approaches explore the relationships among symptoms using either regression-based techniques19 or measures of similarity13 and create symptom clusters “de novo.” As a first step, participants need to complete 1 or more symptom assessment instruments or a symptom inventory (Figure 1A).5 Then, a variable-centered analytic approach is used to identify the symptom clusters. Historically, 4 statistical approaches were used to identify symptom clusters, namely, cluster analysis, EFA, confirmatory factor analysis (CFA), and principal components analysis (PCA).14

Following the recommendations of Skerman and colleagues,15 EFA is the most common approach used to identify symptom clusters in oncology research, followed by hierarchical cluster analysis (HCA).14,18,20 In contrast, PCA is the most common approach used to identify symptom clusters in other chronic conditions (eg, chronic obstructive pulmonary disease [COPD],21 HIV infection22). However, PCA uses a data-reduction approach to analyze symptoms and does not assume any causal relationship between the symptoms within a cluster.15,23 Given that one hypothesis underlying symptom cluster research is that symptoms cluster together due to a shared, underlying mechanism,8,9 the use of PCA is not consistent with this hypothesis.

A non-exhaustive search of the Cumulative Index of Nursing and Allied Health Literature (CINAHL) and PubMed databases was conducted to explore the use of different variable-centered approaches for studying symptom clusters. Exemplars for each statistical method are described in Supplemental Digital Content Table 1 (available at: As noted later, compared to studies of oncology patients, research on symptom clusters in patients with other chronic conditions is much less common. Therefore, exemplar studies conducted in samples with other chronic conditions are highlighted in Supplemental Digital Content Table 1 (available at: to stimulate growth in symptom cluster research within these patient populations.

Hierarchical cluster analysis

HCA is one type of cluster analysis that has been used in symptom cluster research across a variety of chronic conditions.20,22,24 It is important to note that depending on the research question, HCA can be used to group symptoms or patients.13

Two types of HCA can be used: agglomerative or divisive.25 Starting with all of the symptoms in individual clusters, agglomerative HCA is used to identify and successively group pairs or groups of similar symptoms into mutually exclusive clusters of related symptoms.26 In contrast, divisive HCA starts with all of the symptoms in a single cluster. Then, it systematically partitions the cluster into smaller groups of similar symptoms.25 The hierarchical clustering of symptoms continues in a stepwise fashion until a certain level of groupings that have clinical meaning and interpretability is selected.15 These steps are displayed graphically on a dendrogram. Measures of similarity for interval data include correlation coefficients or squared Euclidean distances,13 while coefficients of association can be used for binary data.15

HCA has several limitations.13,15 First, it is important to note that cluster analytic methods are not based on the underlying assumption of shared causality. Rather, they seek to identify groupings based on statistical measures of similarity.13 Second, because cluster analytic methods strive to identify mutually exclusive groups of similar symptoms, a symptom can belong to only one cluster.15 Given that a single symptom may be related to multiple symptoms that associate into different clusters, this limitation does not allow for an examination of symptoms that cross-load on other clusters. In addition, it impedes our ability to identify common and distinct underlying mechanisms. Third, using HCA, the determination of the final number of clusters is highly subjective. This subjectivity may lead to bias, as well as variability in both the number and types of symptom clusters identified across studies.

Thirty-nine studies were identified that evaluated for symptom clusters “de novo” using HCA. While 74.4% of these studies were conducted in patients with cancer, exemplars of studies that used HCA to identify symptom clusters in patients with other chronic conditions are provided in Supplemental Digital Content Table 1 (available at:

Exploratory factor analysis

The common factor model consists of 2 factor analytic methods: EFA and CFA. Factor analytic methods are used to discover unobserved or latent factors (ie, symptom clusters) that account for the common variance among multiple, observed variables (ie, symptoms).27 The underlying conceptual framework for factor analytic methods is that variables within a latent factor covary because of a common underlying cause. The “strength and direction of the influence”23(p10) of the latent factors on the variables in the common factor model are estimated with factor loadings. Because of the exploratory nature of EFA, no assumptions are made a priori about the nature of the relationships between the observed variables.23

A unique feature of EFA is that symptoms can load on more than 1 factor (ie, symptom cluster).23 Given the possibility that one symptom can influence symptoms on different clusters, the ability for a symptom to load on more than 1 cluster has conceptual utility. For example, in a study that evaluated for symptom clusters in patients with lung cancer,28 difficulty concentrating and feeling nervous cross-loaded on a sickness behavior and a psychological cluster. However, a lack of consensus exists on whether a symptom can load on multiple factors. For example, in a recent review of studies that evaluated for symptom clusters in patients receiving adjuvant chemotherapy,18 only 58.8% of the studies that used EFA allowed for symptoms to cross-load.

Compared with HCA where 39 studies were identified, 89 studies used EFA to identify symptom clusters “de novo.” Of these studies, 66.3% were conducted in patients with cancer. This pattern is consistent with previous reviews that identified EFA as the most common statistical approach for identifying symptom clusters in oncology patients.15,18,20 Exemplars of studies that used EFA to identify symptom clusters in patients with other chronic conditions are provided in Supplemental Digital Content Table 1 (available at:

Confirmatory factor analysis

This approach is used to test hypotheses on the relationships between latent factors and observed variables.27 More specifically, all of the model's assumptions (eg, number of factors, pattern of variable to factor loadings) must be specified a priori. These hypotheses must be rooted in theory and/or empirical evidence.

Given that the conceptual basis for CFA is to confirm hypotheses, it can be used to confirm the number and types of symptom clusters previously identified using another variable-centered approach (eg, EFA).15 For example, in a study that evaluated for symptom clusters in children and adolescents receiving myelosuppressive therapy,29 EFA was used to identify symptom clusters. Then, CFA was used to confirm the structure of the findings. Given the continued need to evaluate and compare different statistical methods to identify symptom clusters “de novo,”4 CFA may be one approach to validate the stability and/or consistency of symptom clusters.

Use of variable-centered approaches to investigate underlying biological mechanisms

Relatively few studies have used a variable-centered approach to evaluate the underlying biological mechanisms of symptom clusters.30,31 In one study,31 EFA was used to identify symptom clusters in oncology patients using the severity dimension. Then, a factor severity score was calculated for each of the 3 symptom clusters that were identified (ie, mood-cognitive, sickness-behavior, and treatment-related). These scores were used in regression analyses to identify associations between each symptom cluster and polymorphisms in cytokine genes.

Another study used EFA to identify 2 symptom clusters in patients with COPD.30 Next, symptom cluster severity scores were calculated for each cluster. Subgroups of patients were identified on the basis of their average symptom cluster severity score. Inflammatory biomarkers were used in logistic regression analyses to identify associations between subgroup membership and levels of C-reactive protein.

A priori identification of symptom clusters and associated symptom cluster profiles

Patient-centered analytic approaches evaluate for relationships among individuals using the principles of structural equation modeling19 or measures of similarity.13 Similar to variable-centered approaches, participants complete 1 or more symptom assessment instruments or a symptom inventory (Figure 1B).5 In the context of symptom cluster research, a symptom cluster must be identified a priori (eg, pain, fatigue, sleep disturbance, and depression). Then, with this prespecified symptom cluster, groups of patients with distinct symptom cluster profiles are identified using patient-centered analytic approaches. Because these methods allow for the identification of subgroups of patients based on their experiences with a prespecified symptom cluster, a variety of phenotypic and molecular risk factors can be identified that distinguish the various patient subgroups.

A search of the CINAHL and PubMed databases identified 31 studies that evaluated the symptom profiles of patients experiencing a prespecified symptom cluster. Exemplars for each statistical method are provided in Supplemental Digital Content Table 2 (available at:

Hierarchical cluster analysis

As mentioned previously, cluster analysis methods such as HCA can be used to “cluster” symptoms or patients. With the latter approach, subgroups of patients are identified based on similar symptom cluster profiles using a prespecified symptom cluster.13 Eight studies were identified that used HCA to evaluate for subgroups of patients based on a clearly defined prespecified symptom cluster. While the majority of these studies were conducted in patients with cancer (75%), exemplar studies that used HCA to identify subgroups of patients with a distinct symptom cluster profile in other chronic conditions are provided in Supplemental Digital Content Table 2 (available at:

Latent variable modeling

Latent variable modeling (LVM) is used to identify subgroups or classes of individuals within a sample or population who have similar attributes or symptom experiences.19 The underlying conceptual framework for LVM is that subgroup membership is based on an unobserved latent variable (ie, prespecified symptom cluster) whose “value indicates what group the individual belongs to.”25(p819) Common types of LVM include LCA for categorical data (eg, symptom occurrence) and latent profile analysis for continuous data (eg, symptom severity). In addition, latent transition analysis can be used to evaluate for changes in subgroup membership over time.19

The identification of subgroups of patients based on their distinct symptom cluster profiles using LVM has multiple advantages. First, differences in salient characteristics (eg, demographics, stress, resilience) between the subgroups can be identified. Second, LVM can be used to evaluate how patient outcomes (eg, functional status, quality of life) differ by class membership.25

While the use of both HCA and LVM results in the identification of subgroups of patients with distinct symptom cluster profiles, the methods differ in a few key ways. First, with LVM, multiple models are evaluated using fit indices before selecting the final model.25 In contrast, selection of the final solution for HCA is highly subjective. Second, because LVM tends to be computationally more challenging than HCA,25 fewer variables may be included in the LVM analysis.

Twenty-three studies have used a form of LVM to identify subgroups of patients with a distinct symptom cluster profile. While most of these studies were conducted in oncology patients (56.5%), exemplar studies that used LVM to identify subgroups of patients with a distinct symptom cluster profile in other chronic conditions are provided in Supplemental Digital Content Table 2 (available at:

Use of patient-centered analytic approaches to investigate underlying biological mechanisms

Ten studies have used a patient-centered analytic approach to evaluate the underlying biological mechanism(s) for a prespecified symptom cluster (see exemplars in Supplemental Digital Content Table 2, available at: In one study,32 latent profile analysis was used to identify 3 distinct subgroups of patients with breast cancer based on their experience with a pain, fatigue, sleep disturbance, and depression cluster. Multiple associations were found between latent class membership and cytokine gene polymorphisms. Another study used HCA to identify subgroups of patients with advanced cancer based on their experience with the symptom cluster of pain, fatigue, depression, and sleep disturbance.10 Higher serum levels of IL-6 were associated with an increased risk for membership in the moderate-to-high symptom subgroup.


Network analysis

One novel approach that can be used to identify symptom clusters “de novo” is NA. Based on the principles of graph theory,33 NA is used to evaluate the relationships between a set of variables (ie, symptoms). The structure of these relationships is presented in graphs. Within these graphs, symptoms are represented as nodes and the relationship(s) between symptoms are represented as edges (Figure 2A). The presence (ie, a relationship between the symptoms) and strength (eg, correlation, conditional association) of these edges are calculated from the data. While firmly based in mathematical and statistical methods, a strength of NA is that it allows for a qualitative (ie, visual) appraisal of the data.

Figure 2.:
(A) An undirected graphical model with 7 nodes. Each node represents a symptom. The presence of an edge between 2 nodes indicates a relationship between them. (B) This figure represents the estimated network of 38 cancer symptoms across the “distress” symptom dimension. In this figure, the node size corresponds to the symptom distress scores and the strength of the relationship between nodes is illustrated by the thickness of the edges. Green edges indicate positive relationships and red edges indicate negative relationships. Symptom clusters were identified using a community detection algorithm and are identified by the color of the symptoms within each cluster. Adapted from Papachristou et al.16 This figure is available in color online (

One challenge with NA is the determination of the importance of nodes or groups of nodes within a network. Various types of centrality indices are used to aid in the interpretation of which nodes (ie, symptoms) may have the largest influence on a network.33–35 These highly influential nodes are sometimes referred to as “core” or “sentinel” nodes16 and have the potential to serve as targets for therapeutic interventions.

Following the network's construction, community detection algorithms are used to identify clusters of symptoms (ie, nodes) that are closely connected relative to other symptoms or clusters.36 Various types of community detection algorithms are available, and selection of the appropriate algorithm depends on multiple factors, including the network's size.37

One of the advantages of NA over other analytic approaches is that you can visualize the relationships between symptom clusters and how symptoms within one cluster relate to symptoms in another cluster. In addition, this approach allows for the identification of core or sentinel symptoms. However, a variety of approaches exist to create the networks and selection of the appropriate algorithms to estimate and evaluate the networks warrant consideration.

Three studies were identified that used NA to evaluate symptoms and/or symptom clusters in patients with cancer.16,38,39 In one study,16 NA was used to identify symptom clusters using multiple dimensions of the symptom experience (ie, occurrence, severity, distress) in a heterogeneous sample of oncology patients. While 5 symptom clusters were identified across all 3 symptom dimensions (ie, psychological, hormonal, respiratory, nutritional, chemotherapy-related), 2 additional symptom clusters (ie, gastrointestinal, epithelial) were identified using distress (Figure 2B). The authors hypothesized that these results suggest that distress is a unique dimension of the patients' symptom experience. Because nausea and lack of appetite had the highest centrality index scores, the authors suggested that targeting these symptoms may decrease the other symptoms within the network.

In another study,39 a network was constructed using severity scores for 8 symptoms and serum concentrations for 13 cytokines. Two communities were identified: a symptom cluster with 5 symptoms and another cluster with all 13 cytokines. While an evaluation of the associations between symptoms and biomarkers warrants additional research, findings from this study illustrate the challenges with incorporating heterogeneous types of data (ie, symptom severity scores and cytokine levels) into an NA.

A third study used HCA and PCA to identify symptom clusters in a sample of patients receiving chemotherapy.38 Three common symptom clusters were identified over 5 assessments. Then, using only the 12 symptoms that were identified in the initial analyses, NA identified comparable symptom clusters that were found using PCA only at one time point. Fatigue, anxiety, and depression were identified as the most central symptoms in the network.

Bayesian networks analysis

Bayesian NA incorporates Bayesian statistics with NA to allow for an evaluation of the strength and direction of the relationships among symptoms.40 While both types of networks contain nodes (ie, symptoms) and edges (ie, relationships between the symptoms), Bayesian NA graphically displays these relationships in a causal model (ie, directed acyclic graph). Conditional dependencies are estimated for each node (ie, symptom). The strength and direction of these relationships are calculated with joint probability distributions.41

Bayesian NA approaches offer many advantages for symptom cluster research. First, in addition to identifying “sentinel” symptoms, Bayesian NA can be used to elucidate the direction and flow of a symptom's influence on other symptoms within a network.41 Second, similar to EFA and LVM, Bayesian NA can identify latent variables.42,43 However, given the complexity of the relationships between symptoms, interpretation of these relationships on an acyclic graph may be challenging. In addition, Bayesian NA methods are computationally expensive,44 particularly with large sample sizes or with large symptom inventories.

While Bayesian NA is used extensively in bioinformatics45 and health sciences46 research, only one study was identified that used Bayesian NA to examine the relationships between symptoms within a prespecified cluster (ie, sleep disturbance, fatigue, depressive symptoms) and their effect on cognitive performance and quality of life in patients with breast cancer receiving chemotherapy.47 Findings from this analysis suggest that the relationships among symptoms changed across time. For example, while mood directly impacted fatigue prior to the start of treatment and at the end of chemotherapy, previous levels of fatigue and sleep disturbance and current quality of life directly impacted the severity of fatigue 1 year after the start of chemotherapy.

Application of NLP to symptom cluster research

An ongoing issue in symptom cluster research is to determine the optimal number of common symptoms that need to be assessed across chronic conditions.18 The determination of a consistent, comprehensive, and clinically meaningful list of symptoms would enable the identification of common symptom clusters across chronic conditions, as well as their common underlying mechanisms. Because of this lack of consensus, inventories with a large number of symptoms are administered to patients to evaluate for symptom clusters, with a potential for increased burden. A variety of new and emerging data science approaches (eg, machine learning, NLP) have the potential to resolve this issue. The application of one of these approaches in symptom cluster research is described in the following text.

NLP is a data extraction method that uses computer-based algorithms to acquire, process, and modify natural language obtained from “Big Data” (eg, electronic health record [EHR]) for computational analyses.48 Systematic extraction of “real-world” symptom data from EHRs and their subsequent evaluation have the potential to not only lessen the burden on patients with chronic conditions but also provide researchers with the “most comprehensive, longitudinal, population-wide dataset”17(p907) available. NLP methodologies have the potential to provide novel information on symptoms and symptom management throughout and beyond treatment of chronic conditions.49

Two recent publications describe the use of NLP in symptom science research. In the first publication,50 the authors used a free and open-source NLP software (ie, NimbleMiner) to find and extract data on 5 symptoms (ie, constipation, depressed mood, disturbed sleep, fatigue, palpitations) from the EHR. While this method was piloted using only 5 symptoms, it can be expanded to include a larger symptom “vocabulary.”

In the second study, Koleck and colleagues17 used NLP to extract 56 symptoms from the EHR nursing notes of 22 647 patients across 4 common chronic conditions (ie, cancer, COPD, heart failure, type II diabetes). Then, HCA was used to identify subgroups of patients with distinct symptom profiles for each chronic condition. While condition-specific symptom profiles were identified (eg, gastrointestinal symptoms and fatigue for cancer, mental health symptoms for COPD), multiple symptom profiles were identified across 2 or more chronic conditions (eg, cognitive and neurological). Given the strength of their results and the ability of NLP software tools to accurately identify and obtain specific symptom data, ongoing development of these methods has the potential to be applied to symptom cluster research and to advance symptom science.


In their report,4 the expert panel called for an examination of symptom clusters across various chronic conditions. These types of comparative studies are needed to determine whether or not “generic” symptom clusters occur across chronic conditions. To accomplish this goal, a comprehensive symptom assessment, as well as consistent methods, needs to be used. Equally important, with the emergence of NA and NLP, studies are needed that compare symptom clusters that are created “de novo” using various analytic approaches.

Based on the literature reviews for each analytic approach, notable gaps in symptom cluster research were identified. In general, the study samples were homogeneous in terms of race or ethnicity, gender identity, socioeconomic status, and educational attainment. Given that each of these characteristics can impact an individual's symptom experience, health outcomes, and quality of life, this lack of diversity and evaluation of a limited number of social determinants of health limits our understanding of how these factors may influence the relationships with and among symptoms and symptom clusters. Future research that evaluates for symptom clusters in diverse and/or underserved samples, across a variety of acute and chronic conditions, is needed. Exemplars of studies that evaluated for differences in symptom clusters in relationship to age, gender, socioeconomic status, or ethnicity are provided in Supplemental Digital Content Tables 1 and 2 (available at: and, respectively).

While the definition of a symptom cluster has evolved over the past 20 years, multiple issues remain that warrant careful consideration to move this area of scientific inquiry forward (Table). Specifically, clear criteria need to be developed to determine the stability and consistency of symptom clusters. The establishment of these criteria will allow researchers to determine within studies whether symptom clusters change over time and/or across dimensions of the symptom experience. In addition, they can be used to evaluate stability and consistency of symptom clusters across studies of patients with similar and different chronic conditions. Additional research is needed to determine whether symptoms in a cluster must be independent or can cross-load on more than 1 cluster. Given that previous studies that used EFA and NA demonstrated that symptoms may load on multiple clusters, or that symptoms within clusters and the clusters themselves are related, this characteristic of a symptom cluster may need to be revised. One way to resolve this issue would be to evaluate common and distinct mechanisms that underlie various symptom clusters that include symptoms that cross-load on more than 1 cluster.


As symptom cluster research continues to evolve, the use of both variable-centered and patient-centered analytic approaches is needed to move the science forward. While each approach has unique strengths and weaknesses, conceptual clarity is needed when a study is designed and the research question should inform the selection of the appropriate method. The conceptual approaches illustrated in Figure 1 can serve as a guide for future studies. Variable-centered approaches identify symptom clusters and are based on the hypothesis that symptoms cluster together because they may share a common underlying mechanism(s). The terminology “symptom clusters” should be used when symptom clusters are created with this approach (Figure 1A). Patient-centered analyses identify subgroups of patients with distinct symptom cluster profiles and associated risk factors. Researchers should clearly specify when they are “clustering” patients (Figure 1B) that they have used a prespecified symptom cluster and identified “subgroups of patients with distinct symptom cluster profiles.”


1. Dodd MJ, Miaskowski C, Paul SM. Symptom clusters and their effect on the functional status of patients with cancer. Oncol Nurs Forum. 2001;28(3):465–470.
2. Given CW, Given B, Azzouz F, Kozachik S, Stommel M. Predictors of pain and fatigue in the year following diagnosis among elderly cancer patients. J Pain Symptom Manage. 2001;21(6):456–466. doi:10.1016/s0885-3924(01)00284-6
3. Dodd M, Janson S, Facione N, et al. Advancing the science of symptom management. J Adv Nurs. 2001;33(5):668–676. doi:10.1046/j.1365-2648.2001.01697.x
4. Miaskowski C, Barsevick A, Berger A, et al. Advancing symptom science through symptom cluster research: expert panel proceedings and recommendations. J Natl Cancer Inst. 2017;109(4):djw253. doi:10.1093/jnci/djw253
5. Miaskowski C, Aouizerat BE, Dodd M, Cooper B. Conceptual issues in symptom clusters research and their implications for quality-of-life assessment in patients with cancer. J Natl Cancer Inst Monogr. 2007;(37):39–46. doi:10.1093/jncimonographs/lgm003
6. Hsu HT, Lin KC, Wu LM, et al. Symptom cluster trajectories during chemotherapy in breast cancer outpatients. J Pain Symptom Manage. 2017;53(6):1017–1025. doi:10.1016/j.jpainsymman.2016.12.354
7. Woods NF, Cray LA, Mitchell ES, Farrin F, Herting J. Polymorphisms in estrogen synthesis genes and symptom clusters during the menopausal transition and early postmenopause: observations from the Seattle Midlife Women's Health Study. Biol Res Nurs. 2018;20(2):153–160. doi:10.1177/1099800417753536
8. Cleeland CS, Bennett GJ, Dantzer R, et al. Are the symptoms of cancer and cancer treatment due to a shared biologic mechanism? A cytokine-immunologic model of cancer symptoms. Cancer. 2003;97(11):2919–2925. doi:10.1002/cncr.11382
9. Miaskowski C, Dodd M, Lee K. Symptom clusters: the new frontier in symptom management research. J Natl Cancer Inst Monogr. 2004;(32):17–21. doi:10.1093/jncimonographs/lgh023
10. Ji YB, Bo CL, Xue XJ, et al. Association of inflammatory cytokines with the symptom cluster of pain, fatigue, depression, and sleep disturbance in Chinese patients with cancer. J Pain Symptom Manage. 2017;54(6):843–852. doi:10.1016/j.jpainsymman.2017.05.003
11. Aktas A. Cancer symptom clusters: current concepts and controversies. Curr Opin Support Palliat Care. 2013;7(1):38–44. doi:10.1097/SPC.0b013e32835def5b
12. Barsevick A. Defining the symptom cluster: how far have we come? Semin Oncol Nurs. 2016;32(4):334–350. doi:10.1016/j.soncn.2016.08.001
13. Kim HJ, Abraham IL. Statistical approaches to modeling symptom clusters in cancer patients. Cancer Nurs. 2008;31(5):E1–E10. doi:10.1097/01.NCC.0000305757.58615.c8
14. Kim HJ, McGuire DB, Tulman L, Barsevick AM. Symptom clusters: concept analysis and clinical implications for cancer nursing. Cancer Nurs. 2005;28(4):270–282; quiz 283-284. doi:10.1097/00002820-200507000-0000
15. Skerman HM, Yates PM, Battistutta D. Multivariate methods to identify cancer-related symptom clusters. Res Nurs Health. 2009;32(3):345–360. doi:10.1002/nur.20323
16. Papachristou N, Barnaghi P, Cooper B, et al. Network analysis of the multidimensional symptom experience of oncology. Sci Rep. 2019;9(1):2258. doi:10.1038/s41598-018-36973-1
17. Koleck TA, Topaz M, Tatonetti NP, et al. Characterizing shared and distinct symptom clusters in common chronic conditions through natural language processing of nursing notes. Res Nurs Health. 2021;44:906–919. doi:10.1002/nur.22190
18. Harris CS, Kober KM, Conley YP, Dhruva AA, Hammer MJ, Miaskowski CA. Symptom clusters in patients receiving chemotherapy: a systematic review. BMJ Support Palliat Care. 2022;12(1):10–21. doi:10.1136/bmjspcare-2021-003325
19. Muthen B, Muthen LK. Intergrating person-centered and variable-centered analyses: growth mixture modeling with latent trajectory classes. Alcohol Clin Exp Res. 2000;24(6):882–891.
20. Sullivan CW, Leutwyler H, Dunn LB, Miaskowski C. A review of the literature on symptom clusters in studies that included oncology patients receiving primary or adjuvant chemotherapy. J Clin Nurs. 2018;27(3/4):516–545. doi:10.1111/jocn.14057
21. Jenkins BA, Athilingam P, Jenkins RA. Symptom clusters in chronic obstructive pulmonary disease: a systematic review. Appl Nurs Res. 2019;45:23–29. doi:10.1016/j.apnr.2018.11.003
22. Zhu Z, Zhao R, Hu Y. Symptom clusters in people living with HIV: a systematic review. J Pain Symptom Manage. 2019;58(1):115–133. doi:10.1016/j.jpainsymman.2019.03.018
23. Fabrigar LR, Wegener DT. Exploratory factor analysis. In: Understanding Statistics. Oxford University Press; 2012:1–18.
24. DeVon HA, Vuckovic K, Ryan CJ, et al. Systematic review of symptom clusters in cardiovascular disease. Eur J Cardiovasc Nurs. 2017;16(1):6–17. doi:10.1177/1474515116642594
25. Woo SE, Jebb AT, Tay L, Parrigon S. Putting the “person” in the center: review and synthesis of person-centered approaches and methods in organizational science. Organ Res Methods. 2018;21(4):814–845. doi:10.1177/1094428117752467
26. Everitt BS, Landau S, Leese M, Stahl D. Cluster Analysis. 5th ed. John Wiley & Sons Ltd; 2011. Wiley Series in Probability and Statistics.
27. Brown TA. Confirmatory Factor Analysis for Applied Research. 2nd ed. Guilford Press; 2015. Methodology in the Social Sciences.
28. Russell J, Wong ML, Mackin L, et al. Stability of symptom clusters in patients with lung cancer receiving chemotherapy. J Pain Symptom Manage. 2019;57(5):909–922. doi:10.1016/j.jpainsymman.2019.02.002
29. Baggott C, Cooper BA, Marina N, Matthay KK, Miaskowski C. Symptom cluster analyses based on symptom occurrence and severity ratings among pediatric oncology patients during myelosuppressive chemotherapy. Cancer Nurs. 2012;35(1):19–28. doi:10.1097/NCC.0b013e31822909fd
30. Yang Z, Cui M, Zhang X, et al. Identification of symptom clusters and their influencing factors in subgroups of Chinese patients with acute exacerbation of chronic obstructive pulmonary disease. J Pain Symptom Manage. 2020;60(3):559–567. doi:10.1016/j.jpainsymman.2020.03.037
31. Miaskowski C, Conley YP, Mastick J, et al. Cytokine gene polymorphisms associated with symptom clusters in oncology patients undergoing radiation therapy. J Pain Symptom Manage. 2017;54(3):305–316. doi:10.1016/j.jpainsymman.2017.05.007
32. Doong SH, Dhruva A, Dunn LB, et al. Associations between cytokine genes and a symptom cluster of pain, fatigue, sleep disturbance, and depression in patients prior to breast cancer surgery. Biol Res Nurs. 2015;17(3):237–247. doi:10.1177/1099800414550394
33. Newman M. Networks: An introduction. Oxford University Press; 2010.
34. Epskamp S, Borsboom D, Fried EI. Estimating psychological networks and their accuracy: a tutorial paper. Behav Res Methods. 2018;50(1):195–212. doi:10.3758/s13428-017-0862-1
35. Freeman L. Centrality in social networks conceptual clarification. Soc Netw. 1979;1:215–239.
36. Orman GK, Labatut V. A comparison of community detection algorithms on artificial networks. In: Gama J, Costa VS, Jorge AM, Brazdil PB, eds. Lecture Notes in Computer Science. Springer; 2009:242–256.
37. Yang Z, Algesheimer R, Tessone CJ. A comparative analysis of community detection algorithms on artificial networks. Sci Rep. 2016;6:30750. doi:10.1038/srep30750
38. Rha SY, Lee J. Stable symptom clusters and evolving symptom networks in relation to chemotherapy cycles. J Pain Symptom Manage. 2021;61(3):544–554. doi:10.1016/j.jpainsymman.2020.08.008
39. Henneghan A, Wright ML, Bourne G, Sales AC. A cross-sectional exploration of cytokine-symptom networks in breast cancer survivors using network analysis. Can J Nurs Res. 2020:1–13. doi:10.1177/0844562120927535
40. Puga JL, Krzywinski M, Altman N. Points of significance. Bayesian networks. Nat Methods. 2015;12(9):799–800. doi:10.1038/nmeth.3550
41. Su C, Andrew A, Karagas MR, Borsuk ME. Using Bayesian networks to discover relations between genes, environment, and disease. BioData Mining. 2013;6(6):1–21.
42. Gao T, Ji Q. Constrained local latent variable discovery. Paper presented at: 25th International Joint Conference on Artificial Intelligence; July 9-15, 2016; New York, NY.
43. Lazic N, Bishop C, Winn J. Structural expectation propagation (SEP): Bayesian structure learning for networks with latent variables. Proc Machine Learn Res. 2013;31:379–387. Accessed August 6, 2021.
44. Needham CJ, Bradford JR, Bulpitt AJ, Westhead DR. A primer on learning in Bayesian networks for computational biology. PLoS Comput Biol. 2007;3(8):e129. doi:10.1371/journal.pcbi.0030129
45. Cooper GF, Bahar I, Becich MJ, et al. The center for causal discovery of biomedical knowledge from big data. J Am Med Inform Assoc. 2015;22(6):1132–1136. doi:10.1093/jamia/ocv059
46. Kyrimi E, McLachlan S, Dube K, Neves MR, Fahmi A, Fenton N. A comprehensive scoping review of Bayesian networks in healthcare: past, present and future. Artif Intell Med. 2021;117:102108. doi:10.1016/j.artmed.2021.102108
47. Xu S, Thompson W, Ancoli-Israel S, Liu LQ, Palmer B, Natarajan L. Cognition, quality-of-life, and symptom clusters in breast cancer: using Bayesian networks to elucidate complex relationships. Psychooncology. 2018;27(3):802–809. doi:10.1002/pon.4571
48. Yim WW, Yetisgen M, Harris WP, Kwan SW. Natural language processing in oncology: a review. JAMA Oncol. 2016;2(6):797–804. doi:10.1001/jamaoncol.2016.0213
49. Koleck TA, Dreisbach C, Bourne PE, Bakken S. Natural language processing of symptoms documented in free-text narratives of electronic health records: a systematic review. J Am Med Inform Assoc. 2019;26(4):364–379. doi:10.1093/jamia/ocy173
50. Koleck TA, Tatonetti NP, Bakken S, et al. Identifying symptom information in clinical notes using natural language processing. Nurs Res. 2021;70(3):173–183. doi:10.1097/NNR.0000000000000488

cluster analysis; factor analysis; latent class analysis; latent variable modeling; natural language processing; network analysis; symptom clusters; symptom science

Supplemental Digital Content

© 2022 Wolters Kluwer Health, Inc. All rights reserved.