Originally pioneered in cancer care and other chronic diseases, gene expression analysis is increasingly used in the investigation of illnesses such as sepsis and trauma (1). Transposing genome-wide techniques to the study of acute illnesses whose basis is physiologic rather than anatomic requires consideration of the appropriate tissue from which to isolate nucleic acids. For cancer, this choice is driven by the specific tissue type that gives rise to the tumor. Diseases such as sepsis impact a number of tissue types simultaneously. In these cases, the selection of a tissue type from which to generate gene expression profiles is uncertain. Investigators are therefore left with a decision that may be informed by cost, convenience, and current practices, but for which an evidentiary basis is limited.
Gene expression studies in sepsis have typically focused on two separate but related tasks. First, studies have aimed to identify specific gene expression states that correspond to the clinical definitions of severe sepsis and septic shock (2, 3). This approach is best illustrated in a recent study by Sweeney et al. in which an 11-gene sepsis gene set was identified by analyzing a number of publicly available gene expression datasets (4). The expression values of this 11-gene signature were shown to differentiate sepsis from sterile inflammation with a high degree of accuracy.
A second type of sepsis gene expression study addresses the lack of specificity inherent in the clinical definition of sepsis, which results in considerable heterogeneity among those assigned this diagnosis. Using an unsupervised, data-driven approach, these studies seek to determine if distinct subtypes of sepsis can be identified based on gene expression characteristics (5, 6).
For both of the above study aims, investigators have largely focused on blood and its cellular components to generate gene expression data (7). This choice reflects not only the ready availability of blood for easy and repeated sampling, but also the key role of blood leukocytes in modulating and effecting the immune and inflammatory responses to infection. Previous work has demonstrated cell-specific expression profiles that differ between leukocyte types, and that reflect their specialized roles in innate and adaptive immunity (8). Nonetheless, the question of which blood cell compartment is most useful in gene expression studies of sepsis has not been explored in depth.
In this study, we used bioinformatics methods to systematically examine gene expression studies of sepsis. Our main objective was to determine the relative utility of different blood cell compartments in studies investigating gene expression correlates of clinical sepsis definitions, as well as those aimed at identifying specific expression-defined sepsis subtypes. We employed a method designed to be agnostic to microarray platform so that samples from a variety of different studies could be included. We evaluated the potential of different blood-derived source tissues to differentiate between patients diagnosed with sepsis, and non-sepsis controls. We also assessed the utility of gene expression data from these tissue sources in unsupervised learning tasks, specifically the capacity to generate distinct patient clusters determined by expression patterns alone.
We searched the NCBI's Gene Expression Omnibus (GEO) using the search terms “sepsis” and “septic shock,” limiting results to human studies, and those with a dataset type of “expression profiling by array.” In order to select studies for further analysis, we reviewed the information contained in the GEO records of each study, as well as the abstracts and full text of any corresponding publications where available. We excluded studies focused on viral diseases such as HIV, studies on malaria, in vitro studies such as endotoxin stimulation assays, and studies enrolling patients with a primary diagnosis of trauma or burns. Studies that had no control patients were also excluded. For each of the included studies, we limited our analysis to samples that were taken within 24 h of the diagnosis of sepsis. This restriction reflects the dynamic nature of gene expression in sepsis syndromes, which has been shown to undergo marked fluctuations in the first few days of illness (9, 10).
For each dataset, we downloaded the normalized expression data, as well as metadata containing whatever clinical information was available on the patients from whom the samples were drawn. We extracted data describing each individual experiment, including the population studied, the tissue type from which the RNA was isolated, and the type of microarray platform used. Each experiment was processed and analyzed individually using its own controls.
We used sample variance as a method of feature selection in order to restrict the number of genes used in the analysis to those most likely to identify expression-based differences between patients. For each individual gene expression dataset, we calculated the variance across probe sets, and selected those with the highest variances (top percentile). We used a partitioning around medoids (PAM) algorithm to generate two clusters of samples, and compared the cluster labels (“A” or “B”) to the clinically assigned labels (“sepsis” or “control”). We evaluated the sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) of the cluster assignments relative to the clinical diagnosis of sepsis, as well as the proportion of correctly classified samples (accuracy). Cluster-derived class labels were assigned so as to maximize specificity for the diagnosis of sepsis.
For the unsupervised learning task, we again used a PAM algorithm to partition the samples from each individual study into clusters. The relative quality of clustering was determined by the average silhouette width, a measure that reflects both cluster cohesiveness and cluster separation (11). The range of possible values for the average silhouette width is between −1 and 1, with higher values suggesting better clustering. For each gene expression study, we calculated the average silhouette width for the samples divided into two, three, four, and five clusters, in order to evaluate whether more than two distinct subtypes could be supported by clustering of gene expression data.
We used principal components analysis (PCA) to visualize the gene expression data of select studies. We used Mann–Whitney and Kruskal–Wallis tests to compare the test performance characteristics and average silhouette widths between the tissue types examined. All analyses were done in R (version 3.1.1). Datasets were downloaded from NCBI GEO using the Bioconductor package GEOquery (12).
We identified 19 GEO series meeting our inclusion criteria, one of which included data from three different blood cell types, and one of which was divided across two different microarray platforms. A total of 22 experiments were therefore included in the analysis (Fig. 1 and Table, Supplemental Digital Content 1, at https://links.lww.com/SHK/A339), encompassing a total of 1,765 unique gene expression records (5, 13–27). For most adult studies, sepsis was defined according to either the 1992 American College of Chest Physicians/Society of Critical Care Medicine definition or the 2001 SCCM/ESICM/ACCP/ATS/SIS International Sepsis Definitions (28, 29). Most pediatric studies defined sepsis according to pediatric-specific consensus criteria (30). The Affymetrix Human Genome U133 Plus 2.0 array was used in 13 of the studies. Eleven studies were done in pediatric populations. Healthy controls were used in 14 studies, with non-septic ICU patients used as controls in five studies, and controls not further specified in the remaining studies. Whole blood was the most commonly used source tissue (15 studies), followed by peripheral blood mononuclear cells (PBMC, two studies), neutrophils (two studies), monocytes (two studies), and lymphocytes (one study). In subsequent analyses, the monocyte, lymphocyte, PBMC, and neutrophil groups were merged into a single group (hereafter referred to as the “leukocyte isolates” group). Variance-based feature selection led to the use of between 100 and 550 high variance probe sets in the analysis of each data set.
The performance measures for gene expression data derived from whole blood and leukocyte isolates are shown in Figure 2. Both whole blood and leukocyte isolate gene expression data tended to show low sensitivity and negative predictive value for the diagnosis of sepsis, with better performance in specificity and positive predictive value. The two studies that used PBMC-derived gene expression data showed poor overall accuracy of classification for the diagnosis of sepsis. For lymphocyte-derived gene expression data, performance across the five studies was variable, with two studies yielding perfect classification (GSE46955 and GSE11755), and two studies showing poor performance (GSE9960 and GSE48080). Specificity was significantly higher with samples derived from whole blood compared with leukocyte isolates (mean specificity 94% vs 78%, P = 0.03).
Specificity for the diagnosis of sepsis was higher in studies using healthy controls than in those using non-sepsis ICU patients as controls (94% vs 70%, P = 0.002). Though small sample sizes precluded a subgroup analysis, limiting the studies to those that used non-septic ICU patients as controls suggests that both tissue types yielded similar specificities for the diagnosis of sepsis (whole blood 72%, leukocyte isolates 69%).
In the unsupervised learning task, we used the average silhouette width as a measure of cluster cohesiveness in order to quantify how well gene expression data from the various cell types separated samples into distinct groups. Examples of highly clustered (i.e., higher average silhouette width) and poorly clustered (i.e., lower average silhouette width) datasets are shown in Figure 3. There was a significant difference in cluster cohesiveness between whole blood and leukocyte isolates, with whole blood-derived data yielding significantly higher average silhouette widths (median values 0.28 and 0.19 respectively, P = 0.002). In the specific task of forming two clusters, the difference in average silhouette width was again significant, with whole blood resulting in a higher value than leukocyte isolates (median values 0.41 vs 0.29, P < 0.05, Fig. 4). Studies using non-septic ICU patients as controls tended to form less cohesive clusters overall (median values 0.18 vs 0.30, P = 0.0001). Among the non-septic controls, whole blood-derived samples showed better cluster cohesion than samples derived from leukocyte isolates (median values 0.24 vs 0.14), although smaller sample sizes limited the statistical power of this subgroup analysis.
Owing to the large proportion of studies included in our analysis that focused on pediatric sepsis, we further investigated the influence of age group on both the supervised (sepsis classification) and unsupervised tasks. We found that with data from all cell types combined, specificity for the diagnosis of sepsis was higher in studies of pediatric cohorts than adult cohorts (96% vs 82%, P < 0.05). Average silhouette widths were also higher among pediatric studies (median values 0.33 vs 0.22, P < 0.0001).
An analysis of diagnostic performance and clustering strength limited to the 11 adult studies alone revealed similar findings to the analysis of all 22 studies combined; however, the difference in specificity between whole blood and leukocyte isolate-derived expression data did not reach statistical significance (89% vs 75%, P = 0.34). Average silhouette widths were not significantly different between the two tissue sources (median values 0.21 for whole blood vs 0.16 for isolates, P = 0.13). The lack of statistical significance in these analyses may in part be due to a loss of statistical power in this smaller subset of studies.
We used a structured bioinformatic approach to examine the use of different blood cell compartments in the study of gene expression in sepsis. Gene expression profiling of patients with sepsis has to date used a number of different blood cell types as sources of RNA, but few studies have objectively compared these various approaches. Our study indicates that whole blood is at least as accurate as isolated circulating leukocytes in demonstrating the transcriptional changes that are associated with sepsis. Using whole blood may be a more practical approach to the study of functional genomics in sepsis, as smaller volumes of blood are needed to generate adequate quantities of RNA (31), and samples can be collected directly at the bedside without additional cell separation and purification steps in the laboratory. However, theoretical concerns include the possibility that expression patterns will be heavily influenced by the relative abundance of different leukocyte subtypes within the peripheral circulation at the time of sampling. Statistical methods such as immune cell deconvolution have been used to quantify the contribution of each leukocyte subtype to whole blood gene expression signals in an attempt to account for and mitigate this effect (15), but add additional complexity to bioinformatics workflows.
One study (32) addressed the validity of using whole blood samples directly by comparing the relative expression of different cell type-specific pathways important in the pathophysiology of sepsis. Using gene expression data from whole blood, as well as from lymphocytes, monocytes, and neutrophils in isolation, the authors found that data derived from whole blood showed expression of signature pathways indicative of contributions from all three of these leukocyte subtypes. They also found that the pathways differentially expressed between sepsis patients and healthy controls in the leukocyte subtypes were similarly altered in whole blood samples.
Our study arrived at similar conclusions using a complementary approach based on a systematic analysis of multiple independent gene expression studies. Unlike in previous work, we used expression data directly, without reference to the genes and pathways they represent. This strategy was used to obviate the need for complex methods to merge data from different microarray platforms and study protocols, as well as to explore the utility of signals derived directly from patient samples, with minimal pre-processing and interpretation.
Our results suggest that whole blood-derived gene expression data are able to distinguish patients with sepsis from controls with a high degree of specificity. Whole blood-derived data also formed cohesive clusters, a characteristic of these data that suggests their usefulness in developing genomic classifiers of sepsis, and identifying genomic subtypes of sepsis. By contrast, leukocyte isolate-derived gene expression data tended to be more diffuse, leading to poor cluster differentiation, and less accurate classification of patients.
Leukocyte isolate-derived data showed somewhat mixed results, possibly reflecting the heterogeneity in cell types used in this group. For example, the highest performing datasets were derived from purified monocytes, while poorly performing data were derived from PBMCs that were not further characterized. This pattern is demonstrated by one dataset (GSE 11755) that included gene expression data from different cell types (lymphocytes, monocytes, and whole blood) derived from the same group of pediatric patients with meningococcal sepsis, and non-septic controls (Fig. 5). Data derived from lymphocytes and monocytes formed less cohesive clusters than data derived from whole blood.
While the admixture of various leukocyte types in whole blood has been described as a potential confounder in gene expression studies, our results suggest it may in fact provide an additional layer of information that is useful in identifying and classifying patients with sepsis. For example, measuring gene expression signals from neutrophils in isolation may obscure important differences between patients with higher ratios of neutrophils to PBMCs, and those in whom neutrophil counts are relatively diminished (33). Expression data from purified cell types may therefore be more homogeneous and condensed, with less distance between patients than for data derived from whole blood. One possible explanation for this finding might be changes in gene expression patterns induced by the process of cell subset isolation itself (34). These results support the value of aggregate rather than decomposed biological signals, and suggest the presence of irreducible phenomena that may be obscured when analyzing certain cell types in isolation.
Our study has limitations. First, we included studies from a variety of patient populations, including both adult and pediatric patients, as well as survivors and non-survivors presenting with sepsis, severe sepsis, and septic shock. Differences in gene expression are known to exist among patients with septic shock between the various ages, comorbidities, ethnicities, and severities of illness included in this analysis. Our use of the controls from each study as comparators was intended to address the breadth of sepsis syndromes encountered, reasoning that differences in gene expression would be greater between patients with sepsis and non-sepsis controls, than amid different strata of sepsis patients. We also analyzed studies done among adult populations separately, which revealed a similar pattern of performance characteristics for the diagnosis of sepsis as with all the studies combined, albeit with differences in specificity that were not statistically significant.
Second, our study used gene expression data from a number of different microarray platforms, some of which may be more current, more accurate, and more consistent than others. We addressed this potential confounder by focusing on the expression values themselves rather than gene and pathway annotations, and by analyzing studies using their own internal controls. Additional analyses showed no differences in test performance characteristics between microarray platforms (data not shown). There was, however, a significant difference in cluster cohesiveness between the various microarray platforms used in the included experiments, with Affymetrix experiments yielding the highest average silhouette widths (P < 0.0001, Kruskal–Wallis rank sum test).
Third, while we were able to identify a number of gene expression studies for inclusion, samples sizes were in some cases small (range 9–130 samples, median number of samples = 60). The different leukocyte subtypes examined do share in common a preceding isolation step and a homogenous cellular phenotype. The merging of different leukocyte types done in order to improve sample sizes may, however, have resulted in an averaging of gene expression signals between molecularly distinct cell types (e.g., lymphocytes and neutrophils). Further investigations should focus on determining whether important differences exist in terms of diagnostic test performance and clustering capability, among different types of leukocytes.
Last, the publicly available gene expression data used in our study were annotated by variable and most often minimal amounts of clinical data describing the patients included in the studies. While we were able to demonstrate distinct clusters of patients distinguished by unique gene expression characteristics, we were unable to fully characterize these subgroups on clinical grounds. Our results support the need for future studies to better characterize the clinical phenotypes of genomically distinct sepsis subgroups. Our results also do not preclude the possibility that other approaches – for example expression of cell surface markers or soluble factors – may facilitate the clustering, description, and stratification of patients with sepsis.
In summary, by pooling data from multiple studies of gene expression in sepsis, we found that whole blood provides reliable information, obviating the need for cell isolation and the artifacts that might result. Whole blood gene expression studies may provide reliable information to guide the diagnosis and therapeutic stratification of patients with the complex clinical syndrome of sepsis.
1. Maslove DM, Wong HR. Gene expression
profiling in sepsis: timing, tissue, and translational considerations. Trends Mol Med
2014; 20 4:204–213.
2. Prucha M, Ruryk A, Boriss H, Möller E, Zazula R, Herold I, Claus RA, Reinhart KA, Deigner P, Russwurm S. Expression profiling: toward and application in sepsis diagnostics. Shock
2004; 22 1:29–33.
3. Lissauer ME, Johnson SB, Bochicchio GV, Feild CJ, Cross AS, Hasday JD, Whiteford CC, Nussbaumer WA, Towns M, Scalea TM. Differential expression of toll-like receptor genes: sepsis compared with sterile inflammation 1 day before sepsis diagnosis. Shock
2009; 31 3:238–244.
4. Sweeney TE, Shidham A, Wong HR, Khatri P. A comprehensive time-course-based multicohort analysis of sepsis and sterile inflammation reveals a robust diagnostic gene set. Sci Transl Med
2015; 7 287:287ra71.
5. Wong HR, Cvijanovich N, Lin R, Allen GL, Thomas NJ, Willson DF, Freishtat RJ, Anas N, Meyer K, Checchia PA, et al. Identification of pediatric septic shock subclasses based on genome-wide expression profiling. BMC Med
6. Maslove DM, Tang BM, McLean AS. Identification of sepsis subtypes in critically ill adults using gene expression
profiling. Crit Care
2012; 16 5:R183.
7. Tang BM, Huang SJ, McLean AS. Genome-wide transcription profiling of human sepsis: a systematic review. Crit Care
2010; 14 6:R237.
8. Palmer C, Diehn M, Alizadeh AA, Brown PO. Cell-type specific gene expression
profiles of leukocytes in human peripheral blood. BMC Genomics
2006; 7 1:1–15.
9. Calvano SE, Xiao W, Richards DR, Felciano RM, Baker HV, Cho RJ, Chen RO, Brownstein BH, Cobb JP, Tschoeke SK, et al. A network-based analysis of systemic inflammation in humans. Nature
2005; 437 7061:1032–1037.
10. Talwar S, Munson PJ, Barb J, Fiuza C, Cintron AP, Logun C, Tropea M, Khan S, Reda D, Shelhamer JH, et al. Gene expression
profiles of peripheral blood leukocytes after endotoxin challenge in humans. Physiol Genomics
2006; 25 2:203–215.
11. Rousseeuw PJ. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Comput Appl Math
12. Davis S, Meltzer PS. GEOquery: a bridge between the Gene Expression
Omnibus (GEO) and BioConductor. Bioinformatics
2007; 23 14:1846–1847.
13. Tang BMP, McLean AS, Dawes IW, Huang SJ, Cowley MJ, Lin RCY. Gene-expression profiling of Gram-positive and Gram-negative sepsis in critically ill patients. Crit Care Med
2008; 36 4:1125–1128.
14. Tang BMP, McLean AS, Dawes IW, Huang SJ, Lin RCY. Gene-expression profiling of peripheral blood mononuclear cells in sepsis. Crit Care Med
2009; 37 3:882–888.
15. Parnell GP, Tang BM, Nalos M, Armstrong NJ, Huang SJ, Booth DR, McLean AS. Identifying key regulatory genes in the whole blood of septic patients to monitor underlying immune dysfunctions. Shock
2013; 40 3:166–174.
16. Wynn JL, Cvijanovich NZ, Allen GL, Thomas NJ, Freishtat RJ, Anas N, Meyer K, Checchia PA, Lin R, Shanley TP, et al. The influence of developmental age on the early transcriptomic response of children with septic shock. Mol Med
2011; 17 (11–12):1146–1156.
17. Tang BMP, McLean AS, Dawes IW, Huang SJ, Lin RCY. The use of gene-expression profiling to identify candidate genes in human sepsis. Am J Respir Crit Care Med
2007; 176 7:676–684.
18. Severino P, Silva E, Baggio-Zappia GL, Brunialti MKC, Nucci LA, Rigato O, da Silva IDCG, Machado FR, Salomao R. Patterns of gene expression
in peripheral blood mononuclear cells and outcomes from patients with sepsis secondary to community acquired pneumonia. PLoS One
2014; 9 3:e91886–e191886.
19. Sutherland A, Thomas M, Brandon RA, Brandon RB, Lipman J, Tang B, McLean A, Pascoe R, Price G, Nguyen T, et al. Development and validation of a novel molecular biomarker diagnostic test for the early detection of sepsis. Crit Care
2011; 15 3:R149.
20. Wong HR, Cvijanovich N, Allen GL, Lin R, Anas N, Meyer K, Freishtat RJ, Monaco M, Odoms K, Sakthivel B, et al. Genomic expression profiling across the pediatric systemic inflammatory response syndrome, sepsis, and septic shock spectrum. Crit Care Med
2009; 37 5:1558–1566.
21. Cvijanovich N, Shanley TP, Lin R, Allen GL, Thomas NJ, Checchia P, Anas N, Freishtat RJ, Monaco M, Odoms K, et al. Validating the genomic signature of pediatric septic shock. Physiol Genomics
2008; 34 1:127–134.
22. Shanley TP, Cvijanovich N, Lin R, Allen GL, Thomas NJ, Doctor A, Kalyanaraman M, Tofil NM, Penfil S, Monaco M, et al. Genome-level longitudinal expression of signaling pathways and gene networks in pediatric septic shock. Mol Med
2007; 13 (9–10):495–508.
23. Wong HR, Shanley TP, Sakthivel B, Cvijanovich N, Lin R, Allen GL, Thomas NJ, Doctor A, Kalyanaraman M, Tofil NM, et al. Genome-level expression profiles in pediatric septic shock indicate a role for altered zinc homeostasis in poor outcome. Physiol Genomics
2007; 30 2:146–155.
24. Smith CL, Dickinson P, Forster T, Craigon M, Ross A, Khondoker MR, France R, Ivens A, Lynn DJ, Orme J, et al. Identification of a human neonatal immune-metabolic network associated with bacterial infection. Nat Commun
25. Ahn SH, Tsalik EL, Cyr DD, Zhang Y, van Velkinburgh JC, Langley RJ, Glickman SW, Cairns CB, Zaas AK, Rivers EP, et al. Gene expression
-based classifiers identify Staphylococcus aureus infection in mice and humans. PLoS One
2013; 8 1:e48979.
26. Pankla R, Buddhisa S, Berry M, Blankenship DM, Bancroft GJ, Banchereau J, Lertmemongkolchai G, Chaussabel D. Genomic transcriptional profiling identifies a candidate blood biomarker signature for the diagnosis of septicemic melioidosis. Genome Biol
2009; 10 11:R127.
27. Dolinay T, Kim YS, Howrylak J, Hunninghake GM, An CH, Fredenburgh L, Massaro AF, Rogers A, Gazourian L, Nakahira K, et al. Inflammasome-regulated cytokines are critical mediators of acute lung Injury. Am J Respir Crit Care Med
2012; 185 11:1225–1234.
28. Bone RC, Balk RA, Cerra FB, Dellinger RP, Fein AM, Knaus WA, Schein RM, Sibbald WJ. Definitions for sepsis and organ failure and guidelines for the use of innovative therapies in sepsis. The ACCP/SCCM Consensus Conference Committee. American College of Chest Physicians/Society of Critical Care Medicine. Chest
1992; 101 6:1644–1655.
29. Levy MM, Fink MP, Marshall JC, Abraham E, Angus D, Cook D, Cohen J, Opal SM, Vincent J-L, Ramsay G. 2001 SCCM/ESICM/ACCP/ATS/SIS International Sepsis Definitions Conference. Crit Care Med
2003; 31 4:1250–1256.
30. Goldstein B, Giroir B, Randolph A. International pediatric sepsis consensus conference: definitions for sepsis and organ dysfunction in pediatrics. Pediatr Crit Care Med
2005; 6 1:2–8.
31. Smith CL, Dickinson P, Forster T, Khondoker M, Craigon M, Ross A, Storm P, Burgess S, Lacaze P, Stenson BJ, et al. Quantitative assessment of human whole blood RNA as a potential biomarker for infectious disease. Analyst
2007; 132 12:1200–1209.
32. Wong HR, Freishtat RJ, Monaco M, Odoms K, Shanley TP. Leukocyte subset-derived genomewide expression profiles in pediatric septic shock. Pediatr Critic Care Med
2010; 11 3:349–355.
33. Salciccioli JD, Marshall DC, Pimentel MA, Santos MD, Pollard T, Celi LA, Shalhoub J. The association between the neutrophil-to-lymphocyte ratio and mortality in critical illness: an observational cohort study. Crit Care
2015; 19 1:R154.
34. Beliakova-Bethell N, Massanella M, White C, Lada SM, Du P, Vaida F, Blanco J, Spina CA, Woelk CH. The effect of cell subset isolation method on gene expression
in leukocytes. Cytometry
2013; 85 1:94–104.