Journal Logo


Stress Reactivity to the Trier Social Stress Test in Traditional and Virtual Environments: A Meta-Analytic Comparison

Helminen, Emily C. BS; Morton, Melissa L. MS; Wang, Qiu PhD; Felver, Joshua C. PhD

Author Information
doi: 10.1097/PSY.0000000000000918



Maladaptive responses to stress underlie a host of negative physical and psychological health outcomes (1,2). Understanding the biological mechanisms of the stress response and individual variations in stress responding is of great import to researchers seeking to study the relation between stress and health outcomes. Using physiological measures to study stress responding in laboratory settings is particularly valuable given the objectivity and replicability of such measurement (2,3). One of the central physiological stress responses in humans, the hypothalamic-pituitary-adrenal (HPA) axis, involves a chain of biological reactions that culminates in the systemic release of the glucocorticoid cortisol. A robust literature base has demonstrated that cortisol is a reliable indicator of HPA axis activation (4,5). Furthermore, cortisol peaks approximately 20 to 30 minutes after the onset of a stressor, which is ideal for studying stress in a laboratory setting as it provides adequate time to sample the physiological response to the stressor (i.e., stress reactivity), along with the subsequent recovery to prestressor baseline (5).

Another important system that responds to stress in the body is the cardiovascular system. The cardiovascular system responds more quickly to stressors than the HPA axis and is able to be noninvasively measured continuously during stressful situations (e.g., by wearing a heart rate [HR] monitor), rather than prestress and poststress induction. HR has been widely studied as an indicator of sympathetic activation (4). Furthermore, HR typically peaks during a stressor rather than after (6). Both physiological variables allow for investigation into the stress response during and after a stressor is present.

Traditional Trier Social Stress Test: Protocol and Effectiveness

Kirschbaum et al. (7) created a standardized stress induction protocol known as the Trier Social Stress Test (TSST) to induce stress in a laboratory setting. The original three-part TSST protocol includes a preparation period, an oral speaking task, and a mathematics task. During the preparation period, the participant is asked to think about why they would be the perfect candidate for their ideal job. During the speech task, they must speak for 5 minutes as to why they would be a good candidate for this job, and during the mathematics task, the participant is instructed to count backward from 1022 in decrements of 13. The speech and mathematics portions of the experiment are completed in front of two trained research assistants in white laboratory coats. The research assistants maintain neutral affect throughout the duration of the experiment (7,8). In summarizing the existing TSST literature, Dickerson and Kemeny (5) demonstrated that the effectiveness of the socially evaluative threat and unpredictability of the TSST induces a reliable physiological stress response that is measured via heightened levels of cortisol from pre-TSST (baseline) to post-TSST.

A more recent meta-analysis of studies (n = 186) using original or slightly varied versions of the traditional TSST protocol (7) concluded that the traditional TSST is highly effective at inducing stress measured via increases in cortisol measured pre- to post-TSST (mean d = 0.93; (9)). Because the TSST reliably induces a large physiological stress response, it has become the most widely used laboratory-based task for inducing stress in humans (10), and it is ideal for studying stress under standardized conditions. Recent technological advancements have integrated the traditional TSST into virtual reality environments.

Virtual TSST: Protocols and Effectiveness

In the past decade, scientists have implemented the traditional TSST protocol into a variety of virtual environments (V-TSST) to further increase the standardization and replicability and to reduce the resource intensiveness of the original protocol. To this end, the V-TSST has been developed using technology such as immersive head-mounted displays (HMDs; (11)), screen-based displays (12), and three-dimensional (3D) projections (13).

To date, only three of the published V-TSST studies (11,14,15) have included a traditional TSST comparison group in the same experiment, and one additional V-TSST study (12) included archival data for traditional TSST comparison groups. Kelly and colleagues (14) compared the stress-inducing effects of V-TSST (an HMD version) with a traditional TSST. Results indicated that cortisol levels significantly increased in both the V-TSST and traditional TSST groups; however, the magnitude of the cortisol increase in the traditional TSST was statistically significantly greater than the V-TSST, leading these researchers to conclude that the traditional TSST was a more stressful task. Shiban and colleagues (11) also demonstrated similar cortisol results whereby the traditional TSST elicited a statistically significantly greater cortisol response than two V-TSST conditions. However, participants in the V-TSST conditions demonstrated greater self-reported stress (SRS) ratings from participants when compared with the traditional TSST. Hawn and colleagues (12) used archival data from a traditional TSST group to compare with a screen-based V-TSST condition. They found that the traditional TSST was statistically significantly more effective in inducing cortisol and HR responses, but there were no statistically significant differences in SRS or blood pressure responses between the different TSST types. In addition, in this study, the participants from the archival data were significantly younger than those in the V-TSST experiment, and age is known to influence stress reactivity to the TSST (4,6).

Most recently, Zimmer et al. (15) demonstrated comparable stress responses between the V-TSST and the traditional TSST in a randomized controlled trial. There were slight differences in responding between groups, in which the V-TSST group cortisol levels rose and declined slightly earlier than the traditional TSST group (i.e., V-TSST cortisol peaked at 20 minutes after onset, and traditional TSST cortisol levels peaked at 30 minutes after onset of the stressor), but overall, the stress response profiles were comparable. Zimmer et al. (15) also used an HMD V-TSST, but the comparable stress responses between groups contradict the findings from Kelly et al. (14). Zimmer et al. (15) address the contradiction between study findings by highlighting the advances in virtual environments in the last decade, noting that the earlier study used an HMD that sounded clunky and uncomfortable; Kelly et al. (14) described their HMD as a helmet with a small viewing screen. In addition, Zimmer et al. (15) increased the methodological rigor of their study by including a V-TSST, a virtual control (no stress) condition, a traditional TSST, and a traditional control (no stress) condition. Only the V-TSST and the traditional TSST groups demonstrated increases in stress reactivity. Notably, in this study, all participants were males, and male participants have consistently demonstrated higher stress reactivity to the TSST (16). Given these ambiguous findings across a variety of variables, more research is necessary to determine the comparability of the V-TSST to the traditional TSST.

In a recent meta-analysis, Helminen et al. (17) analyzed the effectiveness of V-TSST protocols in inducing stress via cortisol reactivity in 13 published studies that used various versions of the V-TSST. They found that the V-TSST is effective at inducing a statistically significant physiological stress response via measurement of cortisol (standardized mean gain effect size of ESsg = 0.65), and that there was significant moderating influences of age and sex on the magnitude of the stress reactivity response, consistent with prior research (6,16). Helminen et al. (17) demonstrated compelling evidence for the effectiveness of V-TSST in inducing a stress response; however, the magnitude of the change in cortisol reactivity was not directly contrasted to the effectiveness of the traditional TSST.

Given the robust stress reactivity literature base amassed using the traditional TSST, it is appropriate to question the utility of novel versions of the TSST given the effectiveness of the existing protocol (4,9). Various versions of V-TSST were developed not because they induce stronger physiological stress response rather because of the several distinct advantages the V-TSST versions possess over the traditional TSST protocol. V-TSST protocols minimize the need for resources that are required in the traditional TSST. The traditional TSST requires at least three research assistants and a minimum of two rooms, along with a variety of accessory materials (7,8). The V-TSST can eliminate the need for most of these resources because most of these elements can be recreated in virtual.

The V-TSST also offers advantages from a methodological standpoint. Throughout the course of any given traditional TSST experiment, confederate age, sex, and race may vary based on the availability of research assistants, which could vary between experiments (e.g., one laboratory using demographically different research assistants than another) and within experiments (e.g., demographically different research assistants running participants on different days of the week). This introduces confounding variability in data collection methods that ultimately may reduce data quality, as, for example, Goodman et al. (9) have demonstrated that the stressfulness of the test varies with differences in confederate characteristics, particularly when changing the sex composition of the confederate panel.

Study Aims

The V-TSST shows promise as an alternative to the traditional TSST in that it has the potential to reduce the resources necessary to run stress test experiments, while simultaneously being able to increase standardization and replicability. Currently, based on studies that have compared the two types of the TSST, it is unclear whether the V-TSST is stressful enough to compete with or replace the traditional TSST. In this review, we aim to compare the two types of the TSST by a) sex and age matching current V-TSST studies to traditional TSST studies; b) quantitatively comparing the two types of the TSST with a meta-analysis of cortisol, HR, and SRS response effects; and c) providing recommendations for the future of V-TSST research based on the findings.


The databases PsycINFO and PubMed were searched for peer-reviewed, English-language journal articles with the keywords “virtual” OR “virtual reality” AND “TSST” OR “Trier Social Stress Test” until November 2019. References of collected articles were also scanned, and additional articles were included as necessary.

Inclusion and Exclusion Criteria: V-TSST Studies

Inclusion criteria included the following: a) adult participants (age ≥18 years), b) use of a virtual version of the TSST, and c) collection and reporting of cortisol levels before and after the stress test (i.e., baseline and peak measures of cortisol) to measure stress reactivity. Studies included any type of TSST in which the participant and/or confederates were in a virtual environment (i.e., an environment other than where they were located physically). This included screen-based TSST, immersion with HMDs, and 3D projections of the TSST environment.

Exclusion criteria were studies with a) youth (age <18 years), because of significant differences in stress reactivity in youth (4), and b) missing defining component/s of the TSST (i.e., speech or math task), as the full protocol has been designed to be socially evaluative and unpredictable, and the inclusion of both tasks indicate the greatest effect sizes for cortisol reactivity in previous work (5). Figure 1 details the selection process for including studies in this meta-analysis.

Flow diagram for the study selection process. TSST = Trier Social Stress Test; V-TSST = virtual Trier Social Stress Test; HR = heart rate; SRS = self-reported stress.

Inclusion and Exclusion Criteria: Traditional TSST Studies

After the identification of V-TSST articles to include in this meta-analysis, traditional TSST articles with similar participant characteristics (i.e., sex and age compositions) were selected and individually matched to each V-TSST study. With the exception of Hawn et al. (12), data from V-TSST studies that already included traditional TSST groups in the same article were used as the traditional TSST match group (11,14,15). The traditional TSST group in Hawn et al. (12) had statistically significantly younger participants than the V-TSST group (i.e., V-TSST mean age = 38 years, traditional TSST mean age = 30 years). To accurately age and sex match the V-TSST sample, this study was subjected to the traditional TSST article matching process, described subsequently.

The PubMed database was searched with the keywords “TSST” OR “Trier Social Stress Test” AND “cortisol” until November 2019. Because there are thousands of published articles using traditional TSST studies, the search was narrowed for each matched article initially by restricting it to the same year that the V-TSST article was published, which also offered a further matching characteristic of being conducted during the same period. The inclusion criteria for the traditional TSST matched articles included criterion (a) and criterion (c) noted in the V-TSST inclusion criteria. Exclusion criteria were the same for the traditional TSST articles as those used for the V-TSST studies. Additional details on the sex- and age-matching process, along with decisions made during this process, are detailed in online Supplemental Digital Content,

Coding and Data Extraction

Two independent coders (i.e., the first and second authors) compiled information on sex, age, sample size, and type of TSST for each study included in this review. We also coded whether studies had participants seated or standing during the TSST, as posture can impact cardiovascular responses (18). For studies that had both clinical and healthy control populations, only the healthy control data were coded for in this meta-analysis. This is due to previous research indicating that clinical populations can have differing stress reactivity responses to the TSST (19). Studies that had several groups divided based on nonclinical characteristics (e.g., high and low performance on a task), group characteristics and cortisol values were combined (20).

For cortisol, HR, and SRS values, means and standard deviations (SDs) for baseline and peak stress were extracted from each study when reported. Many studies did not explicitly report these numbers in the text or in tables, and thus, values had to be extracted from figures representing the results. Two independent extractors (i.e., the first and second authors or the first author and a trained research assistant) used WebPlotDigitizer (21) to extract values directly from plots at baseline and height of stress. We decided to use WebPlotDigitizer instead of individually contacting authors for data because of the relatively large number of articles that did not explicitly report numbers and the potential for authors to not respond. Furthermore, we were confident in using WebPlotDigitizer for this task owing to its empirically demonstrated accuracy and reliability. When tested by having two independent coders extract 3596 data points from 36 separate graphs, this software has demonstrated a high level of accuracy (e.g., correlation between extracted data and original data was r = 0.989) and interrater reliability (e.g., correlation between independent coder values was r = 0.997; (22)). Graphs from which values were extracted had variables of interest (i.e., cortisol, HR, or SRS) values plotted as a function of time, and mean values and standard errors of the mean were extracted for baseline and peak stress time points. SDs were calculated from standard errors of the mean according to Lipsey and Wilson (23).

The independent data extractors achieved an interrater reliability of 93.7% and with very high correlation between values extracted (r > 0.99). Following the method from Drevon et al. (22), extractors were considered to be in agreement if extracted values were within 1% of each other, using the range of the y-axis as a reference. For discrepant extracted data points, each extractor independently reextracted the values and they were compared again. This was sufficient to reach consensus for all cortisol, HR, and SRS values.

Publication Bias

Effect sizes and standard errors for each study included in this meta-analysis were plotted and assessed with the trim-and-fill method from Duval and Tweedie (24). This method allows for a visual representation of publication bias, and the effect of any potential missing studies on the overall mean effect size can be estimated.

Data Analysis

Because the V-TSST studies and the matched traditional TSST studies are not directly comparable as groups would be if they were part of one study with an experimental and control group, within-subject effect sizes will be calculated instead. Within-subject effect sizes are commonly used to calculate stress reactivity effects and have been used in previous meta-analyses of TSST studies (see Refs. (9,17)). Importantly, within-subject effect sizes can still be used to compare differences between groups (see the Moderation Analyses for TSST Type section in this article).

Effect Size Calculation

Lipsey and Wilson (23) recommend using the standardized mean gain effect size statistic (ESsg) for pre-post contrasts. This calculation includes a pooled SD term (SDp) that pools the baseline and peak SDs. Because of the way the SDp term is calculated (see Ref. (23), p. 44), this effect size is similar to a Cohen d effect size, and it has been interpreted the same way in the literature (9). However, to be consistent throughout the document, we denoted effect sizes as ESsg. The standardized mean gain values were selected because of the varying metrics of reported cortisol values (e.g., log cortisol, in nanomoles per liter). The standardized mean gain effect size was calculated with the following equation:


where Mpeak = mean cortisol value at peak stress, Mbase = mean cortisol value at baseline, and SDp = pooled SD of cortisol values at peak and baseline.

For the standard error calculation of ESsg, pre-post r correlation coefficients are required. To the authors’ knowledge, only one study has compiled pre-post r correlation coefficients for cortisol reactivity to the traditional TSST. Khoury et al. (25) looked at pre-post r values for depressed and older populations, and they found that baseline to peak cortisol reactivity had correlation coefficients of 0.65 and 0.74 for each population, respectively. Although these are specialized populations, it could potentially be used as a baseline for estimating pre-post r correlation coefficients for cortisol reactivity to the TSST. No other studies report pre-post correlation, so a sensitivity analysis was performed with r values from 0.1 to 0.9 and both virtual and traditional TSST groups were compared with each correlation coefficient (see the SRS Moderation Analysis section).

Analysis of Homogeneity

To determine the variation among studies that is due to differences other than sampling error, an analysis of homogeneity was conducted in accordance with Lipsey and Wilson’s (23) Q statistic calculation. The significant Q statistic indicates that there is significant heterogeneity among studies beyond that of sampling error.

Moderation Analyses for TSST Type

In addition, moderation analyses were conducted to determine if any stress reactivity differences between studies were due to the type of TSST used (i.e., traditional or V-TSST). Moderation analyses were conducted for all three stress reactivity variables of interest (i.e., cortisol, HR, and SRS) using the Metafor package in the open-source statistical software R (26).

Because the types of TSST are mutually exclusive categories, TSST-type (i.e., the moderator) was dummy coded by assigning a value of “0” to traditional TSST studies and a value of “1” to V-TSST studies. Thus, the meta-analytic analog to the ANOVA moderation analysis was conducted for all effect sizes of interest (i.e., cortisol, HR, and SRS), as recommended by Lipsey and Wilson (23). This type of moderation analysis accounts for weights of each individual effect size and results in two homogeneity statistics, one for variability within each category (QW) and one for variability between categories (QB). If QB is statistically significant, this means that the categorical moderator accounts for at least some of the variability among studies, and if it is not significant, this means that there is no variability between categories.


For the cortisol analysis, 29 studies were included in this meta-analytical comparison. For three studies, both a V-TSST and traditional TSST protocol were used in the meta-analysis, resulting in an even number of effect sizes (k = 32; 16 traditional and 16 V-TSST groups). For the HR analysis and for the SRS analysis, 12 studies (k = 14) and 13 studies (k = 16) were included, respectively. Besides matching sex and age, the traditional TSST studies were, when applicable, also matched by population characteristic (e.g., participants in the luteal phase of their menstrual cycle) to the V-TSST studies. Because of the article matching process, sex composition (M [SD] = 64.5% [40.5%] male) and age (M [SD] = 27.5 [7.8] years) across for all V-TSST studies was similar to the sex composition (M [SD] = 65.2% [39.4%] male) and age (M [SD] = 26.0 [6.9] years) across all traditional TSST studies. When reporting posture, there was a notable lack of specification for whether participants were standing or seated during the stress task; only 18 (55%) of the 33 studies made the posture of participants clear. Details of each V-TSST study and its corresponding traditional TSST counterpart(s) are summarized in Table 1.

TABLE 1 - Characteristics of V-TSST and Matched Traditional TSST Data Used in Analyses
Matched Studies TSST Type Seated or Standing Sample Size Participant Sex Composition (M/F), % Mean Age, y Variable(s) Included in Analyses
Annerstedt et al. (27) Virtual Seated 10 100/0 28.1 Cortisol, HR, SRS
Merz et al. (28) Traditional NR a 48 100/0 24.3 Cortisol
Childs et al. (29) Traditional NR 25 100/0 26.1 HR, SRS
Domes and Zimmer (30) Virtual Standing b 23 100/0 NR, but recruited on a university campus Cortisol
Cantave et al. (31) Traditional Standing 100 100/0 24.1 Cortisol
Fallon et al. (32) Virtual NR 16 50/50 19.3 Cortisol, SRS
Lupis et al. (33) Traditional Standing 44 45.5/55.5 20.5 Cortisol
Espín et al. (34) Traditional Standing 50 52/48 19.7 SRS
Fich et al. (35) Virtual Seated 14 100/0 23.9 Cortisol, HR
Oei et al. (36) Traditional NR a 20 100/0 22.4
Hawn et al. (12) Virtual Standing 21 54.5/45.5 38.5 Cortisol, SRS
Back et al. (37) Traditional NR a 18 50/50 33.6
Jönsson et al. (13) Virtual Seated 10 100/0 28.3 Cortisol, HR
Weerda et al. (38) Traditional NR a 20 100/0 26.5 Cortisol
Kelly et al. (39) Traditional NR a 30 100/0 28.0 HR
Jönsson et al. (40) Virtual Seated 20 50/50 49.2 Cortisol, HR, SRS
Buchanan et al. (41) Traditional Standing 54 50/50 50.2
Kelly et al. (14) c Virtual NR 46 49.6/50.4 21.0 Cortisol, SRS
Traditional Standing 46
Linninge et al. (42) Virtual Seated b 23 100/0 24.7 Cortisol, HR, SRS
Pisanski et al. (43) Traditional NR a 27 100/0 26.6 Cortisol
Andersen et al. (44) Traditional NR a 20 100/0 23.7 HR, SRS
Montero-López et al. (45) Virtual Seated 18 0/100 33.2 Cortisol
Nater et al. (46) Traditional NR a 17 0/100 27.2
Riem et al. (47) Virtual Seated 45 0/100 20.2 Cortisol, SRS
Finch et al. (48) Traditional NR a 30 0/100 20.2
Santos-Ruiz et al. (49) Virtual NR 21 28.6/71.4 24 Cortisol
Entringer et al. (50) Traditional NR a 30 26.7
Santos-Ruiz et al. (51) Virtual NR a 38 0/100 28 Cortisol
Schoofs and Wolf (52) Traditional NR 22 0
Shiban et al. (11) d Virtual Standing 15 100/0 23.8 Cortisol, HR, SRS
Traditional Standing 15 100/0
Zimmer et al. (53) Virtual Standing 24 100/0 24.9 Cortisol, HR, SRS
Traditional Standing 20 100/0 26.1
Zimmer et al. (53) Virtual Standing 49 100/0 24.8 Cortisol, SRS
Langer et al. (54) NR Standing 36 100/0 24.4
V-TSST = virtual Trier Social Stress Test; TSST = Trier Social Stress Test; M/F = male/female; NR = not reported; HR = heart rate; SRS = self-reported stress.
a Authors did not explicitly state whether standing or seated but referenced the original protocol in the Methods section (the original protocol specifies that participants should stand in front of the panel of judges).
b Authors did not explicitly state whether participants were seated or standing but referenced a previous V-TSST protocol in the Methods section.
c Sex and age were reported for the full sample and not specified for the virtual and traditional TSST groups separately.
d Age was reported for the full sample and not specified for the virtual and traditional TSST groups separately.

Time of peak cortisol measures varied across V-TSST studies with a mean (SD) of 23.4 (5.0) minutes after the beginning of the stressor. Time of traditional TSST peak cortisol measures were similar, averaging (SD) 23.4 (6.9) minutes after the onset of the TSST. These fall within the acceptable ranges for peak cortisol (55).

Overall Reactivity

Standardized mean gain effect sizes for each V-TSST study and traditional TSST study are presented in the forest plots in Figures 2–4. Tests of heterogeneity across all studies were significant for cortisol (Q = 110.94, df = 31, p < .001), HR (Q = 28.02, df = 13, p < .01), and SRS (Q = 39.24, df = 15, p < .001), which means that the between-study variation in each analysis is greater than what would be seen with random sampling error (23). Thus, random-effects models were used to calculate aggregate effect sizes. The aggregate mean effect size was ESsg = 0.71 (SE = 0.09) for all cortisol studies, ESsg = 0.90 (SE = 0.11) for all HR studies, and ESsg = 0.89 (SE = 0.11) for all SRS studies, which are considered medium to large effects according to recommendations by Cohen (56).

Cortisol reactivity forest plot of individual study effect sizes and overall effect sizes for modeled with random effects. VR TSST = virtual reality Trier Social Stress Test; CI = confidence interval; RE model = random-effects model; TSST = Trier Social Stress Test.
Heart rate reactivity forest plot of individual study effect sizes and overall effect sizes for modeled with random effects. VR TSST = virtual reality Trier Social Stress Test; CI = confidence interval; RE model = random-effects model; TSST = Trier Social Stress Test.
Self-reported stress reactivity forest plot of individual study effect sizes and overall effect sizes for modeled with random effects. VR TSST = virtual reality Trier Social Stress Test; CI = confidence interval; RE model = random-effects model; TSST = Trier Social Stress Test.

Cortisol Moderation Analysis

A moderation analysis was conducted to determine whether there were differences in cortisol reactivity based on the type of TSST used (i.e., V-TSST or traditional TSST). The results of the moderation analysis demonstrated no statistically significant differences between the V-TSST study subgroup and the traditional TSST study subgroup. Between-group heterogeneity was nonsignificant (QB = 1.12, df = 1, p = .29), indicating that the type of TSST does not have an effect on the cortisol reactivity based on this collection of studies. In other words, based on the matched studies, the overall mean effect size for V-TSST studies (ESsg = 0.61, SE = 0.10) was not statistically significantly different from that of traditional TSST studies (ESsg = 0.79, SE = 0.14).

HR Moderation Analysis

A moderation analysis was conducted to determine whether there were differences in HR reactivity based on the type of TSST used (i.e., V-TSST or traditional TSST). The results of the moderation analysis demonstrated no differences between the V-TSST study subgroup and the traditional TSST study subgroup. Between-group heterogeneity was nonsignificant (QB = 0.29, df = 1, p = .59), indicating that the type of TSST does not have an effect on the HR reactivity based on this collection of studies. In other words, based on the matched studies, the overall mean effect size for V-TSST studies (ESsg = 0.98, SE = 0.12) was not statistically significantly different from that of traditional TSST studies (ESsg = 0.85, SE = 0.19).

SRS Moderation Analysis

A moderation analysis was conducted to determine whether there were differences in SRS reactivity based on the type of TSST used (i.e., V-TSST or traditional TSST). The results of the moderation analysis demonstrated no differences between the V-TSST study subgroup and the traditional TSST study subgroup. Between-group heterogeneity was nonsignificant (QB = 0.14, df = 1, p = .71), indicating that the type of TSST does not have an effect on the SRS reactivity based on this collection of studies. In other words, based on the matched studies, the overall mean effect size for V-TSST studies (ESsg = 0.94, SE = 0.20) was not statistically significantly different from that of traditional TSST studies (ESsg = 0.85, SE = 0.13).

Pre-Post r Correlation Sensitivity Analysis and Comparison

The standard error calculation for ESsg includes a pre-post r correlation coefficient, which was unavailable in studies in this meta-analysis. To see how varying pre-post r correlations affect the results of this meta-analytical comparison, values between the extremes (i.e., r = 0.1 and r = 0.9) were tested and compared with the previously discussed results in which a correlation coefficient of r = 0.5 was used.

For cortisol, aggregate effect sizes varied for both V-TSST studies (range ESsg = 0.51–0.63) and traditional TSST studies (range ESsg = 0.66–0.82) with r values of 0.1 and 0.9. Moderation analysis of TSST type were rerun with each r value; between-study heterogeneity continued to be nonsignificant with r = 0.1 (QB = 1.36, df = 1, p = .24) and with r = 0.9 (QB = 0.64, df = 1, p = .42), indicating that regardless of the r value, there are no statistically significant differences between V-TSST and traditional TSST studies with respect to cortisol reactivity. The same process was performed for the HR and SRS analyses, with similar results. Figure 5 depicts the variations for V-TSST and traditional TSST studies across each r value for cortisol.

Sensitivity analysis of pre-post r correlation for r = 0.1, r = 0.5, and r = 0.9 for cortisol reactivity effect sizes. Error bars represent +1 standard error of the mean. VR = virtual reality; ESsg = gain effect size statistic.

Publication Bias Assessment

Effect sizes for each study were plotted with each study’s respective standard error, and publication bias was assessed via these plots (see Figure 6 for cortisol studies). Initial plots for all variables demonstrated bias toward greater effect sizes. Using Duval and Tweedie’s (24) trim-and-fill method, potentially missing studies were added to the plots for each variable of interest, and overall mean effect size was recalculated. The adjusted overall mean effect size was ESsg = 0.55 (SE = 0.11) for cortisol, ESsg = 0.71 (SE = 0.14) for HR, and ESsg = 0.83 (SE = 0.15) for SRS, which still range from medium to large effect sizes (56). Therefore, even if there are completed studies in existence with smaller effect sizes that have not been published, including these potential studies would not dramatically affect the results of this meta-analysis. Notably, the adjusted effect size for HR decreased the most, indicating that the HR studies had the most publication bias.

Funnel plot of all cortisol studies included in this meta-analysis (A), and funnel plot including potentially missing studies (open circles) using the trim and fill method (B).


This meta-analytic comparison study aimed to more accurately compare the V-TSST with the traditional TSST in regard to stress reactivity (measured by cortisol, HR, and SRS) by accounting for the given study sample’s sex and age compositions. Results of meta-analyses with each variable suggest that the V-TSST and traditional TSST do not significantly differ in eliciting a stress response. All moderation analyses demonstrated no moderating effect of TSST type (i.e., V-TSST or traditional) on overall effect sizes, indicating that the two types of the TSST do not elicit statistically significantly different cortisol, HR, or SRS reactivity.


Findings of this study have several important implications for the future of TSST research. This meta-analysis lends preliminary evidence to the utility and effectiveness of V-TSST protocols and suggests that they are not lacking in the ability to elicit a stress response across HPA axis, the cardiovascular system, and self-reported variables in comparisons with traditional TSST protocols. These findings may give confidence to stress researchers to consider adopting V-TSST protocols in the future.

V-TSST protocols have several advantages over the traditional TSST protocol. Once a virtual environment is set up, the V-TSST becomes much less resource-intensive and more standardized, both of which are important considerations in controlled laboratory research. The V-TSST allows experiments to be run with minimal research assistants because live confederates are replaced by virtual avatars or prerecorded videos of confederates. In addition, the virtual environment can be programmed to include video cameras and a two-way mirror, whereas in a traditional TSST experiment, researchers would need continual access to this type of room and accessory materials.

The increased standardization that comes with V-TSST protocols may also be an attractive reason for stress researchers to adopt V-TSST. Recent meta-analytic findings have shown that confederate characteristics, such as sex and affective display, have significant effects on the stressfulness of the TSST (9). With V-TSST protocols, these differences could be controlled, or experimentally manipulated, and additional characteristics could be explored to determine if they influence stress reactivity. For example, researchers could easily and systematically test the effects of a variety of ethnicities and ages of confederates in V-TSST to see if these changes have significant influential effects on the stress response. Potential confounds could be controlled for and new questions would be able to be investigated with the increased standardization and personalization that come with V-TSST protocols.


Although promising, the results of this meta-analytic investigation are not without limitations. Most notably, participants were sex and age matched across studies rather than within studies, with the exception of three studies (11,14,15). If a greater number of studies included both a V-TSST group and a traditional TSST group, more direct comparisons could be made and more accurate conclusions could be drawn. However, because a sufficient number of studies directly comparing both protocols are not currently available in the literature, such studies cannot yet be meta-analyzed. To the authors’ knowledge, only four studies have compared V-TSST and traditional TSST protocols; three studies compared the different types within the same investigation (11,14,15), and one study compared V-TSST to results from a previous traditional TSST study sample that was not age matched (12).

Research using the V-TSST is still in its infancy, and there are several notable limitations in the current literature. There is considerable variety across the V-TSST protocols, particularly in the type of virtual environment that is used and the posture of participants during the stress task (i.e., seated or standing). For the virtual environment, V-TSST studies have used both 2D (i.e., computer or television screens) and 3D projections. Some have also used head-mounted devices in which the participant is immersed in the virtual environment. This variety makes it difficult to consider the V-TSST as a whole. However, this limitation can be considered in tandem with previous meta-analytic results showing that more immersive virtual environments (i.e., virtual environments that completely replace any outside audio or visual cues) have a moderating effect on stress reactivity; the more immersive a virtual environment, the greater the effect size is for cortisol reactivity (17). For posture, future studies should pay more attention to the potential impact of standing on the stress response (e.g., see Ref. (57)) and, at a minimum, report whether participants are seated or standing during the stress task.

The V-TSST literature is also limited by the sample sizes present in the current literature. Several V-TSST studies are pilot trials investigating the utility of different V-TSST protocols. The traditional TSST literature is much more established, and therefore, sample sizes in these studies are usually adequate. Greater sample size in V-TSST studies would lend more confidence to both individual and meta-analytic study results. In addition, with the studies included in this meta-analysis, there was some evidence for publication bias. Although this may indicate that overall effect sizes may be slightly lower than what has been demonstrated, it does not necessarily detract from the comparability of effect sizes between the traditional and V-TSST groups; it is likely that both groups’ effect sizes would decrease with publication bias but could still be compared.

Other interesting trends in the V-TSST literature is that numerous V-TSST studies included male-only participants and most of the V-TSST studies were conducted with younger adults (i.e., ages <30 years). Having only male participants has the potential to skew the results because men have been shown to be consistently more reactive to the TSST, particularly for cortisol reactivity (16). However, because the male-only V-TSST studies were compared with male-only traditional TSST studies, this does not necessarily detract from the evidence that the V-TSST is just as stressful as the traditional TSST. Effects of age are less clear but may have some moderating stress reactivity effects in both the V-TSST (17) and the traditional TSST (4).

A final limitation of this meta-analysis is that, although there were no statistically significant differences between traditional and V-TSST cortisol reactivity, aggregate effect sizes for the traditional TSST were consistently (but nonsignificantly) greater than V-TSST effect sizes (Figure 5). Although differences were not significant, there are still practical considerations in these differences. For instance, a power analysis using the effect size from the traditional TSST studies (ES = 0.79) may provide a much different required sample size than the effect size from the V-TSST studies. Furthermore, with additional studies and greater sample sizes, it is possible that significant differences may emerge; however, with the advancements in virtual technology, it is also possible that the gap may be reduced. As it currently stands, if researchers need to examine cortisol reactivity and are limited in their ability to procure a large sample size, the traditional TSST may be a better option. However, if researchers need to examine HR or SRS reactivity, either type of TSST may be appropriate.

In summary, limitations exist both in this meta-analysis and in the literature in several areas, including comparativeness, variety in virtual environments, small sample sizes, and prevalence of male-only participants and younger adult participants. In light of these limitations, the results of this meta-analysis should be used as a promising foundation on which to build more confidence in V-TSST effectiveness, rather than being considered the final say on the superiority of V-TSST research.

Future Research Directions

Considering the results of this meta-analytic investigation together with the limitations still present, areas of future research emerge. To truly and directly compare V-TSST protocols with the traditional TSST protocols, any future V-TSST studies should also include a traditional TSST control group. Once enough studies directly comparing the two types are published, another meta-analytic investigation may take place to see if the results are consistent with the results presented in this meta-analysis.

Now that there is evidence that the V-TSST is effective, future V-TSST studies should look to increase sample sizes and diversify participants by including more balanced samples of male and female participants, and including participants from a broader range of ages, particularly middle-aged and older adults. Stress research has the potential to be revolutionized by V-TSST protocols, and preliminary meta-analytic results are promising, but we have a due diligence as researchers to more systematically evaluate the V-TSST before wide-scale adoption of V-TSST protocols in future experiments.

Source of Funding and Conflicts of Interest: This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. The authors report no conflict of interest.


1. Schneiderman N, Ironson G, Siegel SD. Stress and health: psychological, behavioral, and biological determinants. Annu Rev Clin Psychol 2005;1:607–28.
2. Dickerson SS, Gruenewald TL, Kemeny ME. When the social self is threatened: shame, physiology, and health. J Pers 2004;72:1191–216.
3. Chrousos GP. Stress and disorders of the stress system. Nat Rev Endocrinol 2009;5:374–81.
4. Allen AP, Kennedy PJ, Cryan JF, Dinan TG, Clarke G. Biological and psychological markers of stress in humans: focus on the Trier Social Stress Test. Neurosci Biobehav Rev 2014;38:94–124.
5. Dickerson SS, Kemeny ME. Acute stressors and cortisol responses: a theoretical integration and synthesis of laboratory research. Psychol Bull 2004;130:355–91.
6. Kudielka BM, Buske-Kirschbaum A, Hellhammer DH, Kirschbaum C. Differential heart rate reactivity and recovery after psychosocial stress (TSST) in healthy children, younger adults, and elderly adults: the impact of age and gender. Int J Behav Med 2004;11:116–21.
7. Kirschbaum C, Pirke KM, Hellhammer DH. The ‘Trier Social Stress Test—a tool for investigating psychobiological stress responses in a laboratory setting. Neuropsychobiology 1993;28:76–81.
8. Birkett MA. The Trier Social Stress Test protocol for inducing psychological stress. J Vis Exp 2011:3238.
9. Goodman WK, Janson J, Wolf JM. Meta-analytical assessment of the effects of protocol variations on cortisol responses to the Trier Social Stress Test. Psychoneuroendocrinology 2017;80:26–35.
10. Allen AP, Kennedy PJ, Dockray S, Cryan JF, Dinan TG, Clarke G. The Trier Social Stress Test: principles and practice. Neurobiol Stress 2016;6:113–26.
11. *Shiban Y, Diemer J, Brandl S, Zack R, Mühlberger A, Wüst S. Trier social stress test in vivo and in virtual reality: dissociation of response domains. Int J Psychophysiol 2016;110:47–55.
12. *Hawn SE, Paul L, Thomas S, Miller S, Amstadter AB. Stress reactivity to an electronic version of the Trier Social Stress Test: a pilot study. Front Psychol 2015;6:724.
13. *Jönsson P, Wallergård M, Österberg K, Hansen ÅM, Johansson G, Karlson B. Cardiovascular and cortisol reactivity and habituation to a virtual reality version of the Trier Social Stress Test: a pilot study. Psychoneuroendocrinology 2010;35:1397–403.
14. *Kelly O, Matheson K, Martinez A, Merali Z, Anisman H. Psychosocial stress evoked by a virtual audience: relation to neuroendocrine activity. Cyberpsychol Behav 2007;10:655–62.
15. *Zimmer P, Buttlar B, Halbeisen G, Walther E, Domes G. Virtually stressed? A refined virtual reality adaptation of the Trier Social Stress Test (TSST) induces robust endocrine responses. Psychoneuroendocrinology 2019;101:186–92.
16. Liu JJW, Ein N, Peck K, Huang V, Pruessner JC, Vickers K. Sex differences in salivary cortisol reactivity to the Trier Social Stress Test (TSST): a meta-analysis. Psychoneuroendocrinology 2017;82:26–37.
17. Helminen EC, Morton ML, Wang Q, Felver JC. A meta-analysis of cortisol reactivity to the Trier Social Stress Test in virtual environments. Psychoneuroendocrinology 2019;110:104437.
18. Tulen JH, Boomsma F, Man in ‘t Veld AJ. Cardiovascular control and plasma catecholamines during rest and mental stress: effects of posture. Clin Sci (Lond) 1999;96:567–76.
19. Zänkert S, Bellingrath S, Wüst S, Kudielka BM. HPA axis responses to psychological challenge linking stress and disease: what do we know on sources of intra- and interindividual variability? Psychoneuroendocrinology 2019;105:86–97.
20. Higgins J, Green. Cochrane Handbook for Systematic Reviews of Interventions. 2011. Available at: Accessed December 10, 2018.
21. Rohatgi A. WebPlotDigitizer (Version 4.1). Published January 2018. Available at: Accessed October 8, 2018.
22. Drevon D, Fursa SR, Malcolm AL. Intercoder reliability and validity of WebPlotDigitizer in extracting graphed data. Behav Modif 2017;41:323–39.
23. Lipsey MW, Wilson DB. Practical Meta-Analysis. Thousand Oaks, CA: Sage Publications, Inc; 2001.
24. Duval S, Tweedie R. Trim and fill: a simple funnel-plot–based method of testing and adjusting for publication bias in meta-analysis. Biometrics 2000;56:455–63.
25. Khoury JE, Gonzalez A, Levitan RD, Pruessner JC, Chopra K, Basile VS, Masellis M, Goodwill A, Atkinson L. Summary cortisol reactivity indicators: interrelations and meaning. Neurobiol Stress 2015;2:34–43.
26. Viechtbauer W. Metafor: Meta-Analysis Package for R. 2017. Available at: Accessed January 24, 2019.
27. *Annerstedt M, Jönsson P, Wallergård M, Johansson G, Karlson B, Grahn P, Hansen AM, Währborg P. Inducing physiological stress recovery with sounds of nature in a virtual reality forest—results from a pilot study. Physiol Behav 2013;118:240–50.
    28. *Merz CJ, Wolf OT, Schweckendiek J, Klucken T, Vaitl D, Stark R. Stress differentially affects fear conditioning in men and women. Psychoneuroendocrinology 2013;38:2529–41.
      29. *Childs E, O’Connor S, de Wit H. Bidirectional interactions between acute psychosocial stress and acute intravenous alcohol in healthy men. Alcohol Clin Exp Res 2011;35:1794–803.
        30. *Domes G, Zimmer P. Acute stress enhances the sensitivity for facial emotions: a signal detection approach. Stress 2019;22:455–60.
          31. *Cantave CY, Langevin S, Marin MF, Brendgen M, Lupien S, Ouellet-Morin I. Impact of maltreatment on depressive symptoms in young male adults: the mediating and moderating role of cortisol stress response and coping strategies. Psychoneuroendocrinology 2019;103:41–8.
            32. *Fallon MA, Careaga JS, Sbarra DA, O’Connor MF. Utility of a virtual Trier Social Stress Test: initial findings and benchmarking comparisons. Psychosom Med 2016;78:835–40.
              33. *Lupis SB, Sabik NJ, Wolf JM. Role of shame and body esteem in cortisol stress responses. J Behav Med 2016;39:262–75.
                34. *Espín L, Marquina M, Hidalgo V, Salvador A, Gómez-Amor J. No effects of psychosocial stress on memory retrieval in non-treated young students with Generalized Social Phobia. Psychoneuroendocrinology 2016;73:51–62.
                  35. *Fich LB, Jönsson P, Kirkegaard PH, Wallergård M, Garde AH, Hansen Å. Can architectural design alter the physiological reaction to psychosocial stress? A virtual TSST experiment. Physiol Behav 2014;135:91–7.
                    36. *Oei NYL, Both S, van Heemst D, van der Grond J. Acute stress-induced cortisol elevations mediate reward system activity during subconscious processing of sexual stimuli. Psychoneuroendocrinology 2014;39:111–20.
                      37. *Back SE, Gros DF, Price M, LaRowe S, Flanagan J, Brady KT, Davis C, Jaconis M, McCauley JL. Laboratory-induced stress and craving among individuals with prescription opioid dependence. Drug Alcohol Depend 2015;155:60–7.
                        38. *Weerda R, Muehlhan M, Wolf OT, Thiel CM. Effects of acute psychosocial stress on working memory related brain activity in men. Hum Brain Mapp 2010;31:1418–29.
                          39. Kelly MM, Tyrka AR, Anderson GM, Price LH, Carpenter LL. Sex differences in emotional and physiological responses to the Trier Social Stress Test. J Behav Ther Exp Psychiatry 2008;39:87–98.
                            40. *Jönsson P, Österberg K, Wallergård M, Hansen ÅM, Garde AH, Johansson G, Karlson B. Exhaustion-related changes in cardiovascular and cortisol reactivity to acute psychosocial stress. Physiol Behav 2015;151:327–37.
                              41. *Buchanan TW, Driscoll D, Mowrer SM, Sollers JJ 3rd, Thayer JF, Kirschbaum C, Tranel D. Medial prefrontal cortex damage affects physiological and psychological stress responses differently in men and women. Psychoneuroendocrinology 2010;35:56–66.
                                42. *Linninge C, Jönsson P, Bolinsson H, Önning G, Eriksson J, Johansson G, Ahrné S. Effects of acute stress provocation on cortisol levels, zonulin and inflammatory markers in low- and high-stressed men. Biol Psychol 2018;138:48–55.
                                  43. *Pisanski K, Kobylarek A, Jakubowska L, Zych J, Religioni J, Orlowski TM. Multimodal stress detection: testing for covariation in vocal, hormonal and physiological responses to Trier Social Stress Test. Horm Behav 2018;106:52–61.
                                    44. *Andersen EH, Lewis GF, Belger A. Aberrant parasympathetic reactivity to acute psychosocial stress in male patients with schizophrenia spectrum disorders. Psychiatry Res 2018;265:39–47.
                                      45. *Montero-López E, Santos-Ruiz A, García-Ríos MC, Rodríguez-Blázquez M, Rogers HL, Peralta-Ramírez MI. The relationship between the menstrual cycle and cortisol secretion: daily and stress-invoked cortisol patterns. Int J Psychophysiol 2018;131:67–72.
                                        46. *Nater UM, Bohus M, Abbruzzese E, Ditzen B, Gaab J, Kleindienst N, Ebner-Priemer U, Mauchnik J, Ehlert U. Increased psychological and attenuated cortisol and alpha-amylase responses to acute psychosocial stress in female patients with borderline personality disorder. Psychoneuroendocrinology 2010;35:1565–72.
                                          47. *Riem MME, Kunst LE, Bekker MHJ, Fallon M, Kupper N. Intranasal oxytocin enhances stress-protective effects of social support in women with negative childhood experiences during a virtual Trier Social Stress Test. Psychoneuroendocrinology 2020;111:104482.
                                            48. *Finch LE, Cummings JR, Tomiyama AJ. Cookie or clementine? Psychophysiological stress reactivity and recovery after eating healthy and unhealthy comfort foods. Psychoneuroendocrinology 2019;107:26–36.
                                              49. *Santos-Ruiz AS, Peralta-Ramirez MI, Garcia-Rios MC, Muñoz MA, Navarrete-Navarrete N, Blazquez-Ortiz A. Adaptation of the Trier Social Stress Test to virtual reality: psycho-physiological and neuroendocrine modulation. J Cybertherapy Rehabil 2010;3:405–15.
                                                50. *Entringer S, Kumsta R, Hellhammer DH, Wadhwa PD, Wüst S. Prenatal exposure to maternal psychosocial stress and HPA axis regulation in young adults. Horm Behav 2009;55:292–8.
                                                  51. *Santos-Ruiz A, Garcia-Rios MC, Fernandez-Sanchez JC, Perez-Garcia M, Muñoz-García MA, Peralta-Ramirez MI. Can decision-making skills affect responses to psychological stress in healthy women? Psychoneuroendocrinology 2012;37:1912–21.
                                                    52. *Schoofs D, Wolf OT. Are salivary gonadal steroid concentrations influenced by acute psychosocial stress? A study using the Trier Social Stress Test (TSST). Int J Psychophysiol 2011;80:36–43.
                                                      53. *Zimmer P, Wu CC, Domes G. Same same but different? Replicating the real surroundings in a virtual trier social stress test (TSST-VR) does not enhance presence or the psychophysiological stress response. Physiol Behav 2019;212:112690.
                                                        54. *Langer K, Moser D, Otto T, Wolf OT, Kumsta R. Cortisol modulates the engagement of multiple memory systems: exploration of a common NR3C2 polymorphism. Psychoneuroendocrinology 2019;107:133–40.
                                                          55. Foley P, Kirschbaum C. Human hypothalamus-pituitary-adrenal axis responses to acute psychosocial stress in laboratory settings. Neurosci Biobehav Rev 2010;35:91–6.
                                                          56. Cohen J. Statistical Power Analysis for the Behavioral Sciences. Hillsdale, NJ: L. Erlbaum Associates; 1988.
                                                          57. Bosch JA, de Geus EJ, Carroll D, Goedhart AD, Anane LA, van Zanten JJ, Helmerhorst EJ, Edwards KM. A general enhancement of autonomic and cortisol responses during social evaluative threat. Psychosom Med 2009;71:877–85 Asterisk denotes that the study is included in meta-analysis.

                                                          meta-analysis; Trier Social Stress Test; virtual reality; cortisol; stress reactivity; heart rate; HMD = head-mounted display; HPA = hypothalamic-pituitary-adrenal; SRS = self-reported stress; TSST = Trier Social Stress Test

                                                          Supplemental Digital Content

                                                          Copyright © 2021 by the American Psychosomatic Society