The Current State of the Empirical Evidence for Psychoanalysis: A Meta-analytic Approach

de Maat, Saskia PhD; de Jonghe, Frans PhD; de Kraker, Ruth MSc; Leichsenring, Falk PhD; Abbass, Allan MD; Luyten, Patrick PhD; Barber, Jacques P. PhD; Van, Rien MD, PhD; Dekker, Jack PhD

doi: 10.1097/HRP.0b013e318294f5fd

Learning Objectives: After participating in this educational activity, the reader should be better able to evaluate the empirical evidence for pre/post changes in psychoanalysis patients with complex mental disorders, and assess the limitations of the meta-analysis.

Background: The effectiveness of psychoanalysis is still a controversial issue, despite increasing research efforts.

Objective: To investigate the empirical evidence for psychoanalysis by means of a systematic review of the literature and a meta-analysis of the research data.

Method: A systematic literature search was undertaken to find studies regarding the effectiveness of psychoanalysis, published between 1970 and 2011. A meta-analysis was performed.

Results: Fourteen studies (total n = 603) were included in the meta-analysis. All but one were pre/post cohort studies. At treatment termination, the mean pre/post effect size across all outcome measures was 1.27 (95% confidence interval [CI], 1.03–1.50; p < .01). The mean pre/post effect size for symptom improvement was 1.52 (95% CI, 1.20–1.84; p < .01), and for improvement in personality characteristics 1.08 (95% CI, 0.89–1.26; p < .01). At follow-up the mean pre/follow-up effect size was 1.46 across all outcome measures (95% CI, 1.08–1.83; p < .01), 1.65 for symptom change (95% CI, 1.24–2.06; p < .01), and 1.31 for personality change (95% CI, 1.00–1.62; p < .01).

Conclusions: A limited number of mainly pre/post studies, presenting mostly completers analyses, provide empirical evidence for pre/post changes in psychoanalysis patients with complex mental disorders, but the lack of comparisons with control treatments is a serious limitation in interpreting the results. Further controlled studies are urgently needed.

From the Vrije Universiteit Amsterdam (Dr. de Maat); Nederlands Psychoanalytisch Instituut, Arkin, Amsterdam, the Netherlands (Drs. de Maat and de Jonghe); Arkin, Amsterdam, the Netherlands (Ms. Kraker; Drs. Van and Dekker); University of Giessen (Dr. Leichsenring); Dalhousie University (Dr. Abbass); University of Leuven and University College London (Dr. Luyten); Derner Institute of Advanced Psychological Studies, Adelphi University (Dr. Barber).

Original manuscript received 27 December 2011; revised manuscript received 16 July 2012, accepted for publication subject to revision 13 August 2012; revised manuscript received 28 August 2012.

Correspondence: Jack Dekker, PhD, Klaprozenweg 111, 1033 NN, Amsterdam, The Netherlands. Email:

As a therapeutic discipline, psychoanalysis encompasses both short- and long-term treatment modalities, as presented schematically in Figure 1.

The two long-term variants are long-term psychoanalytic psychotherapy (LTPP) and psychoanalysis. The criterion most frequently used to differentiate between the two long-term modalities is the therapeutic setting, with the main features being the frequency of the sessions and the physical positions of the patient and the therapist. It is understood that in LTPP both patient and therapist are sitting on chairs facing each other, whereas in psychoanalysis the patient is lying on a couch, and the therapist is sitting on a chair behind him or her. LTPP sessions usually occur once or twice a week; in psychoanalysis the frequency ranges from two to five sessions a week. In this article we concentrate on psychoanalysis proper.

Research has fairly well established the efficacy of LTPP; for example, Shedler,1 Leichsenring,2 and colleagues have conducted meta-analyses, pooling evidence from multiple studies and calculating pooled effect sizes (ESs). Though considerably less evidence is available concerning the efficacy of psychoanalysis, the effectiveness of psychoanalysis has been researched repeatedly. Several reviews and overviews have shown large ESs and conclude that between 60% and 90% of the patients for whom psychoanalysis is indicated derive clinically significant change.3–8 Nevertheless, no meta-analysis has been performed that systematically pooled data specifically on psychoanalysis. Given that psychoanalysis is a long-term, intensive, and expensive treatment, such an analysis of the available empirical data is urgently needed. In this article we present the first meta-analysis of studies examining the effectiveness of psychoanalysis.

Characteristics of Psychoanalysis

Although psychoanalysis is considered a therapy requiring very frequent sessions, there is no universal agreement on the number of sessions. The International Psychoanalytic Association endorses three psychoanalytic training models.9 According to the Eitington model, a frequency of four to five sessions per week is required; in the French model, the session frequency is decided by the analyst and the patient; and in the Uruguayan model, a minimum of three sessions a week is required. In Germany, a frequency of two to three sessions a week is commonly employed, with the patient lying on the couch. This format is called Analytische Psychotherapie, and face-to-face, long-term psychoanalytic psychotherapy is known as Tiefenpsychologisch fundierte Psychotherapie.10 To respect this international variance we opted for a broad definition, including studies in which (1) the patient is lying on a couch, with (2) two to five sessions a week. We performed separate sub-analyses of studies based on treatment frequency (divided into two groups): two to three sessions per week (on average, less than three per week), and three or more sessions per week (on average, three or more per week).

All psychoanalytic therapies, including psychoanalysis, are rooted in the psychoanalytic theories. Gabbard11 outlines the basic principles as follows: much of mental life is unconscious; childhood experiences, in concert with genetic factors, shape the adult; the patient’s transference to the therapist is a primary source of understanding the patient’s character and pathology; the therapist’s countertransference provides valuable information about what the patient induces in others; the patient’s resistance to the therapy process is a major focus of the therapy; symptoms and behaviors serve multiple functions and are determined by complex and often unconscious forces; and the therapist assists the patient in achieving a sense of authenticity.

Despite these “common grounds,” there is presently no single, all-encompassing psychoanalytic theory—but only many partial theories. These theories can be roughly classified into “classical” and “post-classical” views. The classical views (Sigmund Freud and the “Freudians,” Melanie Klein and the “Kleinians,” and the “British Independents”) see intrapersonal conflict as central. Whether referred to as ego psychology, a structural model, a drive-defense model, or a one-person psychology, these approaches concentrate on the triadic relationships of the “oedipal situation,” characterized by sexual and aggressive needs. The post-classical views (with such forerunners as Ferenczi, Balint, and Sullivan, the leaders of relational, interpersonal, intersubjective psychoanalysis, respectively) are developmental theories that focus on “developmental needs,” including the needs to feel connected, seen, understood, loved, appreciated, and protected. Also referred to as a two-person psychology, these approaches concentrate on the dyadic relationships of infancy. In present-day psychoanalysis, the classical and postclassical views coexist. They are not only compatible, but also complementary, to each other.

Personality pathology is a crucial concept in psychoanalytic thinking.12 Psychoanalytic diagnostics basically differentiate between two main forms of personality problems: developmental pathology and conflict pathology (see, for example, Fonagy & Moran).13 Broadly speaking, these two types of problems differ in two ways. The first difference concerns the dating of the origins of the pathology: development pathology relates to problems stemming from early childhood (before the fifth year), whereas conflict pathology relates to problems originating in childhood (around the fifth year and later). The second difference concerns the sort of innate human needs that the pathology mainly pertains to: development pathology focuses on developmental needs such as attachment needs, the need to be valued, seen, and loved, whereas conflict pathology considers the needs of sexuality and aggression. The two kinds of personality pathology do not exclude one another. Most patients present with both developmental pathology and conflict pathology.

Fundamental personality change is considered the goal of psychoanalysis, although its conceptualization depends on the theoretical approach used. It can be summed up as personality growth leading to more differentiation (e.g., of self vs. other, or fantasy vs. reality) and greater integration (of aspects of the self). In psychoanalytic terms, the changes in personality are described as “structural change,” “personality change,” “personality reconstruction or construction,” or the development of a “cohesive,” “adult,” “integrated” self, resulting, among other things, in a greater sense of inner freedom. The purpose of this fundamental change is ultimately for the patient to achieve symptom reduction, prevention of recurrence, better social functioning, and higher quality of life (all persisting after treatment termination).

Psychoanalysis is indicated for patients with “complex mental disorders”14—usually a combination of long-standing, often unsuccessfully treated DSM-defined Axis I disorders (most often, mood disorders) and Axis II personality disorders.15 Several studies show that patients for whom psychoanalysis is indicated suffer from these complex mental disorders.14,16–19

For psychoanalytically trained clinicians, a DSM diagnostic classification is insufficient for a complete diagnosis and treatment choice. These clinicians aim to describe the personality structure of patients in terms of essential psychoanalytical concepts such as defense mechanisms, conflicts, internal object relations, and intrapsychic functioning. In addition, an attempt is made to offer hypotheses explaining the development, maintenance, and recurrence of pathology. In broad psychoanalytic terms, psychoanalysis is useful for moderate to severe conflict pathology and mild developmental pathology.

Research in psychoanalysis is complex to conduct (see de Jonghe et al.).20 The treatments are of considerable length, making it difficult to randomize patients to control conditions that are substantially different from psychoanalysis (see also the discussion section). Study periods that include follow-up are long; the research requires significant funding; and the number of patients is limited. In addition, it is difficult to capture—whether in questionnaires, self-reports, or interviews—the process and outcomes that are considered relevant by psychoanalysts. Some analysts would even argue that doing so is impossible and consider the researcher an “unwanted third” in the treatment. Especially due to the problems of randomization, almost no randomized, controlled trials (RCTs) have been conducted in the field of psychoanalysis. Most studies on psychoanalysis follow a cohort of patients for whom psychoanalysis is indicated, and present pre/post changes. Early research in this field often defined outcome in terms of the therapist’s clinical judgment, reflecting a judgment concerning improvements in personality structure and growth. More recently, measurement instruments have become more common, as are RCTs.

Search Strategy

An extensive literature search was conducted using different search methods. First, we searched the electronic databases PubMed, PsycInfo, Embase, Cochrane Database of Systematic Reviews, and the Cochrane Central Register of Controlled Trials. The time frame was between January 1970 and December 2009. The following search terms were used: psychoanalysis (OR psychoanalytic OR analytic), psychodynamic (OR dynamic OR interpretive OR insight-oriented), therapy (OR psychotherap* OR counseling), long-term (OR open-ended OR LTPP) and treatment outcome (OR outcome OR effective* OR efficacy). The complete search terms are available on request. No limits were set on language. Second, an Internet database of controlled and comparative outcome studies on psychological treatments of depression was searched.21 Third, a manual search was performed on the Open Door Review6 and other reviews and meta-analyses.1–5,7,14 Cross-references in the retrieved publications were tracked down. For the time period of 2010–11, we did not perform the literature search again; instead, we contacted authors of studies that were known to us but whose findings had not yet been published. This third process resulted in two extra studies.

Back to Top | Article Outline

Selection of Studies

The following inclusion criteria were applied:

* The studies were “outcome-intervention studies.” The outcomes had to be measured in terms of symptom reduction or personality change. Issues such as process variables were excluded from this review. Outcome measures had to be reliable and valid, as supported by at least one study on its reliability.

* Studies had to report on completed treatments; studies in which large proportions (more than 25%) of treatments were still ongoing were excluded.

* Studies had to provide ESs; means and standard deviations on measurements; or percentages of patients achieving clinically significant change.

* The studies were required to be RCTs; prospective, pre/post cohort studies (with or without comparison groups); or cross-sectional studies that included a minimum of ten subjects. Case studies or case series were excluded, as were retrospective studies such as surveys.

* The studies were required to include adult patients (18 to 65 years of age).

* The studies had to include only patients with the most “common” (i.e., the most frequently seen in clinical practice) indications for psychoanalysis (i.e., DSM diagnoses [Axis I or II] or psychoanalytically specified symptoms or personality problems). Studies focusing on purely somatic or psychotic disorders were excluded.

* The treatment was psychoanalysis, characterized as follows: (1) patients were lying on the couch, with (2) two to five therapy sessions a week. Whenever it was uncertain whether the treatment was psychoanalysis (so defined), we contacted the authors of studies to determine the type of treatment.

Identification of Relevant Publications and Quality Assessment

Using the selection criteria, two independent judges (SdM and FdJ) reviewed the titles and abstracts generated from the searches. Disagreement was discussed and resolved by consensus. In case of continued disagreement, a third reviewer was consulted (JD). Titles and abstracts identified as potentially relevant were retrieved for full-text review. Two independent raters then examined whether the full-text articles met the inclusion criteria. Disagreement was discussed and resolved by consensus. Studies with unresolved disagreement were reviewed by a third rater.

Two reviewers (SdM and FdJ) evaluated the quality of the studies independently using a Research Quality Score rating system (see Appendix 1). This rating system (developed by the reviewers SdM and FdJ) follows the research criteria postulated by the Cochrane Collaboration and other researchers.22,23 This system assesses aspects of the study design, patients included, interventions, outcome data, statistics, dropout, and follow-up, and reflects the current standards of evidence-based medicine. Maximum scores and cutoff scores are mentioned in Appendix 1. Studies with unresolved disagreement were reviewed by a third rater (JD). We did not calculate interrater reliability.

We performed different meta-analyses, assessing pre- to post-treatment change and pre-treatment to follow-up change, applying these analyses to measurements of overall functioning, symptoms, and personality and psychosocial functioning. The pre- to post-treatment ES was calculated by subtracting the average post-treatment score from the average pre-treatment score and by then dividing the result by the pooled standard deviations of both groups. The pre-treatment to follow-up ES was calculated by subtracting the average follow-up score from the average pre-treatment score and by then dividing the result by the pooled standard deviations of both groups. ESs of ≥0.20 are considered small; ≥0.5, medium; and ≥0.80, large (see Cohen),24 but these qualifiers applied originally to between-group ESs. Furthermore, Cohen also stated explicitly that the qualifiers were based on his experience and were not empirically defined. Finally, pre-to-post ESs are usually larger than between-group ESs. For these reasons, we avoid using the qualifiers.

An overall mean ES was calculated on the basis of all outcome measures used in a study. ESs for symptom measures and for personality and psychosocial functioning measures were calculated separately. For calculating the overall ES, a mean ES was calculated for each study that presented more than one ES. The ESs or the means and standard deviations (whichever was presented) of individual studies were used, in turn, as the basis for calculating an overall mean ES, with the individual study ESs weighted to reflect the study’s sample size. To calculate the pooled mean ESs, we used the statistical computer program Comprehensive Meta-analysis.25 We computed the pooled mean ESs using the random-effects model because considerable heterogeneity of the included studies was expected.26 In the random-effects model, the included studies are seen as a sample drawn from a population of studies, rather than replications of each other, so that not only the random errors within the studies, but also the true variations of ESs from one study to the next, are taken into account. The random-effects model therefore results in broader 95% confidence intervals (CIs) and more conservative results. Most studies did not report within-group correlations (correlations across time points). Therefore we used Cohen’s d for the repeated-measures comparisons, as recommended by Dunlap and colleagues.27

Finally, we calculated between-group ESs, comparing posttest means of psychoanalysis groups with means of nonclinical norm groups (when the latter were available).

Tests for heterogeneity were calculated by using the Q-statistic.28 A significant Q-value rejects the null hypothesis of homogeneity. We also calculated the degree of heterogeneity in percentages, using the I2-statistic.29 A value of 0% indicates no observed heterogeneity; a value of 25%, low heterogeneity; and values of 50% and 75%, moderate and high heterogeneity, respectively.30

Publication bias was tested according to Duval and Tweedie’s trim-and-fill procedure31 using Comprehensive Meta-analysis. This procedure uses funnel plots (a distribution of the expected studies in a field, based on study sizes and their expected ESs) to estimate the number of “missing studies” in a meta-analysis and the effect that these studies may have had on its outcome. The method yields an estimate of the ES after publication bias has been taken into account, meaning that the ESs expected to belong to the “missing studies” are taken into account. Adjusted values of the pooled mean ESs and 95% CIs are then calculated and compared to the original findings of the meta-analysis. In this procedure, we also used the random-effects model.

A secondary outcome measure for the meta-analysis was clinically significant change measured at treatment termination (pre/post treatment) and at follow-up (pre/follow-up). The definitions of clinically significant change are mentioned in Table 8.

Subgroup Analyses

Subgroup analyses were carried out using Comprehensive Meta-analysis.25 Studies were divided into two or more subgroups. Initially, a pooled mean ES was calculated for each subgroup. It was then determined whether the pooled mean ESs differed significantly between subgroups. The mean pooled ESs were computed using the mixed-effects method of subgroup analyses, which pools studies within subgroups according to the random-effects model, but tests for significant differences between subgroups according to the fixed-effects model.

The following subgroup analyses were conducted, based on the following:

* Study quality: studies with higher quality scores versus studies with lower quality scores

* Study design: prospective studies versus studies that had a cross-sectional design (the latter included different patient groups at the beginning and at the end of treatment)

* Continent of study: Europe versus North America

* Frequency of sessions: studies with two to three sessions (with an average below three) per week and studies with three or more sessions per week (with an average of three or more)

* Duration of follow-up: studies with follow-up of up to one year versus studies with follow-up of more than one year

* Symptom-specific sub-analyses: for all studies, only instruments for measuring depression

* Across all studies, patient ratings versus therapist ratings versus observer ratings

Results of the Literature Search: Trial Flow

A flow chart showing the process of study selection is given in Figure 2. After screening titles and abstracts, 164 titles were requested in full text and screened by two raters. Three studies were excluded based on language barriers, and 134 more based on full-text screening. Most important reasons for exclusion were that the studies addressed theoretical issues or presented case descriptions. Twenty-seven studies remained, of which 13 were excluded for methodological or other reasons. The remaining 14 studies16,32–70 were included in the meta-analysis. Table 1 presents the study characteristics of the included studies. Ten studies presented data to calculate mean ESs. Table 2 presents the characteristics and reasons of exclusion of the 13 excluded studies.48(retrospective part of study),71–83 We contacted some authors for additional data and received data from the following: Caspar Berghout and Jolien Zevalkink; Dorothea Huber and Günther Klug; Henriette Löffler-Stastka; Rolf Sandell; and Paul Knekt. Nine studies presented percentages of clinically improved patients and were therefore included in the secondary outcome measures.

Back to Top | Article Outline

Study Characteristics

Of the 14 studies (total n = 603) included in our meta-analysis, 13 were prospective cohort studies, and 1 an RCT.51,52 The study of Knekt and colleagues60,84 included an RCT for three types of psychotherapy and a prospective cohort design for psychoanalysis. The number of patients in the studies varied from 17 to 92. The number of sessions (for any particular patient) ranged from 234 to 971, and the duration of analysis from 2.5 to 6.5 years. Five studies were conducted in the United States, the remainder in Europe; 5 of the 9 European studies were conducted in Germany. Three studies16,32–47 applied a cross-sectional design, assessing different patient groups at pre-treatment, post-treatment, and follow-up.

The quality of the studies varied. First, the sample included only one RCT,51,52 and although some studies followed both a psychoanalysis group and a psychotherapy group, these groups were not controlled against other treatment groups. Second, the measurement instruments varied considerably. Appendix 2 contains a list of all the instruments used. Third, outcome measures varied. In one study, only the therapist rated post-treatment outcome.47 In another study, both patients and therapists rated post-treatment outcome.67 In all other studies, including all follow-up measurements, only patient and independent ratings were included. Analyses were also performed without the studies that did not include independent raters (see section “Pre/Post Effectiveness of Psychoanalysis” below). In a separate sub-analysis we compared patient ratings with independent ratings and therapist ratings (Table 6). Fourth, six studies did not present follow-up results, and of those studies that did, the follow-up periods were relatively short (between 1 and 3.5 years). Fifth, treatment was not manualized in any form, and treatment adherence was not monitored in any study. Systematic descriptions of treatments (mean number of sessions, plus duration) were missing in three studies and had to be estimated. Finally, five studies did not report on dropouts systematically, and all studies but one provided completers-only outcome analyses.60 Overall intention-to-treat analyses were therefore not possible to calculate.

Diagnostic Characteristics

Ten studies presented DSM-III/IV or ICD-9/10 diagnoses. Two studies applied a form of psychoanalytic diagnostic criteria such as the Structural Interview of Kernberg68 or neurotic or non-neurotic personality organization.53 Two studies mentioned only that the patients were “suitable” for psychoanalysis48,63—meaning (at least in general) that a psychoanalyst, after careful clinical evaluation of a patient, believed that the strengths and weaknesses of a patient’s personality structure warranted psychoanalysis.

The diagnostic characteristics of the patients in the included studies matched those found in the research of Doidge,17,18 Caligor,19 and their colleagues. Patients suffered from comorbid Axis I and Axis II disorders. Depressive disorders (range, 27%–100%) and anxiety disorders (range, 39%–100%) were found most frequently. On average, 77% of the patients in this meta-analysis suffered from a depressive disorder, and 50% from an anxiety disorder. Between 20% and 100% of all patients met criteria for a personality disorder, with an average of 47%. Other diagnoses included eating disorders, sexual and relational disorders, work problems, obsessive-compulsive disorders, psychosomatic complaints, and substance abuse. Four studies reported on earlier treatments, if any, of patients.16,32–46,48,60 On average, 73% of those patients had tried previous treatments. Two studies defined the concept of a “clinical case” as patient who scored in the worst 10% clinical range on several measurement instruments.16,32–46 These two studies found that at baseline, 91% and 88%, respectively, of all psychoanalysis patients met clinical case criteria.

Back to Top | Article Outline

Refusal and Dropout Rates

Nine studies reported data on the number of patients that refused to participate in the study, did not start treatment, or dropped out of treatment (Table 3). Four studies reported how many patients refused to participate in the study (range, 13%–40%). Dropout rates ranged from 3% to 33%.16,32–46,53,60

Back to Top | Article Outline

Ten studies provided data for pre/post analyses. Four studies49,51,52,61,67 used a frequency of two to three sessions a week, and six studies16,32–47,60,62,63 a frequency of three or more sessions a week. The ESs and 95% CIs of the studies are plotted in Figure 3. The mean pre/post ES (Cohen’s d)24 of psychoanalysis across all studies and all measurement instruments (Table 4) is 1.27 (95% CI, 1.03–1.50; p < .01), indicating a robust effect. This effect remains fairly stable when the one-study-removed method is followed. Heterogeneity of the overall analysis is moderate and not statistically significant (I2 = 38.80%). The study of Huber and Klug51,52 seems an outlier, with larger ESs than the other studies. Removing this study lowers the heterogeneity (I2 = 20.20%) and also the overall ES (1.20; 95% CI, 0.98–1.41; p < .01). The study by Rudolf and colleagues67 is an outlier on the lower end of the range, with smaller ESs than the other studies. Removing this study raises the mean ES to 1.34 (95% CI, 1.12–1.56; p < .01). Removing both studies that did not use independent ratings (Cogan & Porcerelli47 and Rudolf et al.),67 yields a mean ES of 1.36 (95% CI, 1.11–1.60; p < .01, I2 = 29.38%).

The mean pre/post ES of psychoanalysis across all studies that included only symptom instruments is 1.52 (95% CI, 1.20–1.84; p < .01), indicating a robust effect. This effect remains stable when the one-study-removed method is followed. Heterogeneity of the overall analysis is moderate to large and statistically significant (I2 = 65.57%), and remains so when the one-study-removed method is applied. The two outliers in this analysis are the studies of Huber and Klug (mean symptom ES = 2.27)51,52 and Rudolf (mean symptom ES = 0.87).67 Removing these two studies does not change the mean ES across the remaining studies but lowers heterogeneity to I2 = 41.74% (not significant).

The mean pre/post ES of psychoanalysis across all studies that included only personality and psychosocial functioning instruments is 1.08 (95% CI, 0.89–1.26, p < .01), indicating also a robust effect, albeit somewhat lower than the ES of studies using only symptom measures. The ESs remain similar when the one-study-removed method is applied. Heterogeneity of the overall analysis is very low (I2 = 8.76%) suggesting a very similar outcome across studies.

Sub-analyses showed that the difference between higher-quality studies and lower-quality studies was statistically significant (p = .01), with the former showing higher ESs (the quality score of studies was not determined by the magnitude of the ESs). No difference was found between studies using a cross-sectional design and prospective cohort studies (p = .14). We also found no significant differences in effects between studies from Europe and studies performed in the United States (p = .53). Finally, we found no differences between the four studies49,51,52,61,67 with a lower session frequency (two to three sessions a week; mean number of sessions across studies = 266) and the six studies16,32–47,60,62,63 with a higher session frequency (three to five sessions a week; mean number of sessions = 793) (p = .41 for all instruments’ p = .52 for symptoms instruments; p = .73 for personality and psychosocial functioning instruments).

Three studies used specific depression instruments.32–46,51,52,60 The mean ES was 1.85 (95% CI, 1.13–2.58; p < .01). Heterogeneity in this sub-analysis was high (I2 = 78.97%).

Heterogeneity in the sub-analyses seemed the highest among the group of studies using lower session frequency (all German studies). This finding could be explained by the contrast between the studies of Leichsenring,61 Huber and Klug,51,52 and Grande and colleagues,49 on the one hand, and the study of Rudolf,67 on the other. Mean ESs of these studies (using all instruments) were 1.65, 1.86, 1.38, and 0.87, respectively. The first three studies mentioned, which are more recently performed and use more diverse, international measurement instruments (by contrast, Rudolf’s study uses only a German measurement instrument), consistently present higher ESs. It is not clear, however, how these differences in time and instruments affect ES.

Pre/Follow-Up Effectiveness of Psychoanalysis

Only five studies provided data regarding follow-up analyses (Table 5). The mean pre/follow-up ES (Cohen’s d)24 of psychoanalysis across all these studies and all measurement instruments is 1.46 (95% CI, 1.08–1.83; p < .01; see Figure 4), indicating that the effect of psychoanalysis at follow-up remains stable. This effect remains fairly stable when the one-study-removed method is followed. Heterogeneity of the overall analysis is moderate but not statistically significant (I2 = 50.56%). Removing studies does somewhat lower the heterogeneity, with the lowest heterogeneity (I2 = 25.75%) resulting from the removal of the study by Berghout, Zevalkink, and colleagues.32–46 This study has the lowest mean ES (0.90) of the follow-up studies; the other studies’ ESs were 1.20 (Sandell et al.),16 1.43 (Grande et al.),49 1.79 (Leichsenring et al.),61 and 1.97 (Huber/Klug et al.).51,52 Removing the Berghout and Zevalkink study elevates the mean ES across studies to 1.59 (95% CI, 1.25–1.93; p < .01).

The mean pre/follow-up ES of psychoanalysis across all studies that included only symptom instruments is 1.65 (95% CI, 1.24–2.06, p < .01), indicating that the effect of psychoanalysis at symptom level is stable or even enlarged at follow-up. The mean pre/follow-up ES remains similar when the one-study-removed method is followed. Heterogeneity of the overall analysis is moderate (I2 = 56.89%) but not statistically significant. Removing the Huber and Klug study51,52 lowers heterogeneity considerably (I2 = 33.65%) and leaves the mean ES at 1.50 (95% CI, 1.12–1.87; p < .01). The Huber and Klug study is an outlier with the highest mean ES for symptom instruments (2.24); the other studies in this category have ESs of 1.25 (Berghout/Zevalkink et al.),32–46 1.58 (Grande et al.),49 2.03 (Leichsenring et al.),61 and 1.17 (Sandell et al.).16

The mean pre/follow-up ES of psychoanalysis across all studies that include only personality and psychosocial functioning instruments is 1.31 (95% CI, 1.00–1.62; p < .01), again indicating that the effects of psychoanalysis are stable at follow-up. The mean ES remains similar with the one-study-removed method. Heterogeneity of the overall analysis is low (I2 = 29.55%). The study of Berghout and Zevalkink32–46 seems an outlier; removing this study lowers heterogeneity to zero and raises the mean ES to 1.43 (95% CI, 1.15–1.72; p < .01). This study has the lowest mean ES across personality and psychosocial functioning instruments (0.75); the other studies in this category had ESs of 1.29 (Grande et al.),49 1.69 (Huber/Klug et al.),51,52 1.54 (Leichsenring et al.),61 and 1.21 (Sandell et al.).16

Sub-analyses showed that at follow-up there were no differences in effects between studies that were considered higher in quality and studies lower in quality (p = .39). There was a significant difference at follow-up, however, between the studies with cross-sectional design and the other studies, with the former reporting lower mean ESs (p = .02). Since all studies reporting follow-up were conducted in Europe, no comparison could be made between European and American studies in this respect. We also found two significant differences (overall effect and symptom change) between the (German) studies with a lower mean number of sessions (mean number of sessions across studies = 266) and those with a higher mean number of sessions (mean number of sessions across studies = 810), with the latter reporting lower ESs. The difference between these studies for personality and psychosocial functioning change was a trend finding in the same direction (p = .07).

Finally, we found a significant difference between studies with follow-up periods up to one year and studies with longer follow-up periods (p < .01), indicating lower ESs with studies that included longer follow-up periods. Heterogeneity in the statistically significant sub-analyses was very low to zero, indicating that these follow-up periods were relevant sources of heterogeneity in the main analyses. The two studies using depression instruments showed a large mean ES at follow-up (1.81; 95% CI, 0.33–3.28), again presenting high heterogeneity, which was discussed earlier.32–46,51,52

Comparison of Psychoanalysis Posttest Means and Means of Nonclinical Norm Groups

Seven studies16,32–46,51,52,60–62 could be used to compare posttest means of psychoanalysis against means of nonclinical groups. Table 6 presents between-group ESs.

Generally, the posttest means of psychoanalysis do not differ from the means presented by nonclinical groups. Between-group ESs are small and not statistically significant. Three subscales of the Minnesota Multiphasic Personality Inventory in the Berghout and Zevalkink study show that the posttest means of patients who underwent psychoanalysis are still more elevated than those of nonclinical groups.

Back to Top | Article Outline

Ratings of Therapists Versus Patients Versus Observers

We compared all patient-rated outcomes with therapist-rated outcomes and with observer-rated outcomes (Table 7). Post-treatment and follow-up measurements were taken together. We found that therapist-rated instruments yielded the lowest ESs and that observer-rated instruments yielded the highest, with patient ratings falling in between. Only the difference between the ratings of therapists (lowest ratings) and observers (highest ratings) was statistically significant.

Clinically Significant Change

Our secondary outcome was clinically significant change, indicating how many patients underwent a change that was considered clinically relevant. The criteria are presented in Tables 8a and 8b. The former presents the results measured with symptom or general instruments, and the latter shows the results measured with personality instruments.

At treatment termination an average of 77% of the patients achieved scores under a clinically defined cutoff score or criterion (indicating they were falling in the range of a nonclinical population) of a symptom or general instrument, 48% more than the number of patients scoring under those cutoff scores at baseline. At follow-up an average of 75% achieved that status. For personality and psychosocial functioning instruments, the results indicate that an average of 62% of the patients achieved scores under a clinically defined cutoff score or criterion (indicating that they fell in the range of a nonclinical population), 34% more than the number of patients scoring under those cutoff scores at baseline. At follow-up, an average of 65% achieved such a status.

Back to Top | Article Outline

Publication Bias

Based on the absence of significant differences between the adjusted mean ESs (and 95% CI) and the observed values for any of the main comparisons, we failed to find any indication of publication bias in this meta-analysis (Table 9). When looking at the number of trimmed studies, some evidence for publication bias was found. The mean ES for publication bias of all studies based on only symptom instruments was lower at post-treatment after adjusting for publication bias (Cohen’s d = 1.36; 95% CI, 1.03–1.65). The number of trimmed studies was two, indicating that (based on the funnel plot that shows the spreading of studies and their ESs) due to publication bias, two studies in the field of psychoanalysis might be missing. This publication bias refers to the possibility of studies not being published (perhaps due to study quality or minor results). However, the adjusted value represents a small difference from the 1.52 that we found in this meta-analysis.

We found that psychoanalysis yields substantial pre/post and pre/follow-up change for patients presenting with long-standing, complex mental disorders—most often a combination of DSM-IV mood or anxiety disorders and personality disorders. At treatment termination, the mean pre/post ES was 1.27 for all outcome instruments taken together, 1.52 for symptom instruments, and 1.08 for personality and social functioning outcomes, all indicating substantial pre/post change. At follow-up the mean pre/follow-up ESs 1.46, 1.65, 1.31, respectively, indicating a stable effect. The majority of patients (62%–76%) achieved a clinically significant change, and these figures seemed stable at follow-up. Posttest means showed that after their treatment, psychoanalysis patients mostly fall in the range of nonclinical groups.

As our findings are based on pre/post studies, the effects of psychoanalysis cannot be compared to the effects of possible alternative treatments; consequently, firm conclusions about effectiveness are not possible here.

The dropout rate (between 3% and 33%) did not seem higher in psychoanalysis than in short-term psychotherapies (e.g., 47% in Pampallona et al.85 and 37%–54% in Casacalenda et al.),86 which is notable in view of the length of treatment. Two of the three studies with the highest dropout rates involved more severe pathology, with 100% of the patients presenting with a personality disorder.62,64–66,87

Overall, the heterogeneity in the analyses was moderate, indicating that there are probably systematic differences between the outcomes. The heterogeneity might be influenced by the different measurement instruments used and by differences in patient populations and the treatments used. For instance, 72% of the patients in the Berghout and Zevalkink study32–46 met criteria for personality disorders, and these patients showed lower ESs on depression instruments. By contrast, 34% of the patients in the Huber and Klug study51,52 and 19.50% in the Knekt60 study had personality disorders, and both groups of patients showed higher ESs on depression instruments. Although we can reach no definitive conclusions regarding the relationship between personality disorders and depression outcomes, Newton-Howes and colleagues88 have shown in a meta-analysis that the presence of personality disorders reduces the effect of treatment outcomes for depression.

It could also be suggested, however, that heterogeneity was mainly influenced by the differences between the studies with lower session frequency—all performed in Germany—and those with higher session frequency. The German studies were characterized by better study quality, lower prevalence of patients with personality disorders, and, on average, fewer sessions and higher ESs. In Germany, insurance coverage for psychoanalysis is limited to 300 sessions. How this influences treatment results or indications remains unclear. More research is needed to shed further light on our findings; for example, dose-response studies would be especially useful.

Sub-analyses at treatment termination indicate that some heterogeneity is present even among the German studies. Rudolf’s study,67 for example, seems to be an outlier within that group; it has considerably lower ESs than the other, more recent studies. A partial explanation could that the study used different measurement instruments; whereas the Rudolf study used only one (German) questionnaire (Psychischer und sozial-kommunikativer Befund), whereas the other studies used various, more internationally employed instruments such as the Beck Depression Inventory, Hamilton Depression Rating Scale, Inventory of Interpersonal Problems, and Symptom Checklist–90. In addition, the Rudolf study, dating from 1994, is the oldest of the German studies. Advances in the discipline could potentially have contributed to the differences seen in the more recent studies. That said, the differences remain, without further investigation, largely unexplained.

For personality measurements at treatment termination, heterogeneity was almost zero, indicating that heterogeneity resulted from differences in the effects of symptom change across studies. At follow-up, heterogeneity was also very low to zero in the statistically significant sub-analyses.

Publication bias seems fairly low in our study. ESs computed after the trim-and-fill method did not differ significantly from the mean ESs found in the meta-analysis. Due to the small number of studies, however, calculations of publication bias must be interpreted cautiously.

Nine of the 14 studies encompassed a long-term psychoanalytic psychotherapy condition in addition to psychoanalysis. In this article we restricted ourselves to the pre/post findings of psychoanalysis studies. The question of whether the results of psychoanalysis and LTPP in nonrandomized studies can be compared is a complicated one. One study51,52 did randomize patients to psychoanalysis or LTPP. The authors found that at follow-up, psychoanalysis performed better than LTPP on personality measures (Inventory of Interpersonal Problems and Scale of Psychological Capacities) and on a goal attainment scale.

Finally, we found that in this meta-analysis, therapist ratings were the lowest, that observer ratings were the highest, and that patient ratings fell in between (a possibly counterintuitive result in that one might expect therapists to rate their own work higher than independent observers). There are pros and cons, of course, for utilizing the ratings provided by these three different groups. On the one hand, independent observers have less vested interest in the treatment and might therefore be less biased in judging results. On the other hand, patients and therapists have much more exposure to the actual evidence than independent observers. The literature is not in agreement on the question of whether patients and therapists might overestimate therapy success. In analyzing the findings of the Menninger Foundation’s psychotherapy research project, Harty and Horwitz70 found that both therapists (65%) and patients (54%) rated therapy success higher than independent judges (38%). Other studies have found, though, that self-reports present more modest results than observer ratings.76,78,89,90

Several limitations of our meta-analysis caution against overinterpreting the results. The most important limitation is the use of pretest/posttest analyses; all studies, except for one, were pre/post cohort studies, lacking (randomized) control groups. In evidence-based medicine’s hierarchy of evidence, RCTs present strong scientific evidence, whereas the evidence from pre/post cohort studies is only moderate. The importance of control groups is made clear by Smit and colleagues91 in their recent meta-analysis of LTPP. Their subgroup analysis of the domain’s “target problems showed that LTPP did significantly better when compared to control treatments without a specialized psychotherapy component, but not when compared to various specialized psychotherapy control treatments.” Considered from this point of view, the evidence for the effects of psychoanalysis cannot be more than of moderate strength.

Several researchers have pointed to the difficulties and limitations of RCTs in the field of intensive, long-term treatments, of which psychoanalysis is paradigmatic.64,92,93 de Jonghe and colleagues20 brought attention to the limited feasibility of RCTs because of the restricted acceptability of the control conditions—especially, but not exclusively, in psychoanalysis. They argue that randomization to the most informative control conditions (waiting list, placebo, and no treatment), coupled with the extended length of the treatment period, renders RCTs unacceptable for patients. Most patients considering a psychoanalytic treatment have previously tried therapies with a much lower frequency or duration with no success, and no evidence-based therapies with frequency of sessions and duration comparable to psychoanalysis are available yet to serve as additional conditions in an RCT. Patients are not likely to accept the risk of being allocated by chance to a control condition that they know all too well.

Notwithstanding such concerns, some RCTs have been undertaken. Huber and Klug51,52 succeeded in randomizing patients with depressive disorders to two fairly complicated randomization rounds (G. Klug, written communication). In the first phase, patients were randomized between psychoanalysis and psychodynamic psychotherapy. A few years later a third group was added—for cognitive-behavioral therapy (CBT). In this second phase the randomization board considered but ultimately rejected the possibility of randomly allocating new patients to the three experimental groups; instead, most patients were allocated to the cognitive-behavioral condition, bring it up to the same number as the other two groups. As psychoanalysis in this RCT averaged two sessions a week, the relatively small difference between this treatment and the other condition (psychodynamic therapy of one session/week) might have contributed to the acceptability of the RCT. Likewise, a pilot study by Steven Roose and colleagues94 succeeded in randomizing ten patients to psychoanalysis or CBT. Another ongoing German study by Marianne Leuzinger-Bohleber and Manfred Beutel95 includes an RCT in which patients are randomized to psychoanalysis (two or three times weekly) or CBT. Although the results of these latter two studies are not yet available, the RCTs discussed here demonstrate that randomization is not impossible; we recommend that further RCTs be conducted in this field.

In the meantime, psychoanalysis has to rely mainly on pre/post cohort studies, and it is often argued that such studies might overestimate the ES of a treatment. This drawback of the cohort study design and the related possibility of biased outcomes96,97 cannot be denied, but several extended reviews demonstrate that, in practice, no systematic differences have been found in the results of RCTs versus those of cohort studies and pre/post studies.14,98–102 In a meta-analysis comparing nonrandomized effectiveness studies with randomized efficacy studies of anxiety disorders, Stewart and Chambless103 found a very small difference (Cohen’s d = −0.08 [significant]) between the ESs of the two types of studies. In addition, other studies show that patients receiving no treatment improve minimally. Norton and Price104 found an ES of 0.25 for placebo groups in studies of anxiety disorders, and Leichsenring and Rabung (unpublished data) an ES of 0.12 in control groups of psychoanalytic therapies.

Knowledge of the “natural, untreated” course of the personality pathology of this target group would be helpful in interpreting the results of pre/post studies. For obvious reasons, such knowledge is scarce. Most people that suffer do seek, and fortunately often find, help. Some research suggests that the symptoms of personality disorders somewhat lessen over time, but this research is based almost exclusively on individuals who have been exposed to treatment105–108 or young children or adolescents, in whom personality change is more expected.105,109 Several longitudinal studies, however, have investigated natural changes in personality of adults. Franz and colleagues110 investigated the spontaneous, long-term course of neurotic spectrum disorders, personality disorders, stress reactions, and somatoform disorders in a representative sample of the normaladult population of Mannheim over a period of 11 years. They found a high correlation between the first and last measurements11 years later (r = .55) and strong evidence for a long-term course of psychological impairment. Roberts and DelVecchio111 meta-analyzed 152 longitudinal studies (including 55,000 individuals) and compiled 3,217 test/retest correlations. They found that personality traits were increasingly stable in adulthood (r = 0.31 in childhood; r = 0.64 at 30 years of age; r = 0.74 between 50 and 70 years of age). Terracciano and colleagues112 presented a longitudinal study measuring intra-individual personality change of 684 subjects who were tested at regular intervals of first 6 and then 12 years. Individual stability on ten scales of personality dimensions was high (r = 0.75), and the stability increased slightly when people were over 30 years of age. This research indicates that personality traits and pathology seem, when untreated, fairly stable in adult populations. More research in this area is necessary, and it could serve as a control for otherwise uncontrolled studies of long duration.

Finally, it seems that more and more researchers value uncontrolled effectiveness studies that parallel controlled ones. As Stewart and Chambless103 concluded in their recent meta-analysis of CBT, “One of the most contentious issues in evidence-based practice is the extent to which results from randomized controlled trials can be generalized to routine clinical practice. Uncontrolled effectiveness research permits the researcher to maximize external validity by testing treatments (with prior supporting efficacy research) in all types of naturalistic circumstances to evaluate whether these treatments translate well to the clinical setting.”

In the present meta-analysis, the number of studies is small; the studies are of varying quality; and they each contain small samples of patients. The results therefore rest on a relatively narrow foundation. The treatment and patient groups also vary considerably, and outcomes are not differentiated by DSM disorder. A further limitation of most studies reviewed is that they report only on completers and do not perform intent-to-treat analyses. Completers analysis may exaggerate results. There were only five studies that used follow-up periods, and their lengths were short (with a maximum of 3.5 years). These brief follow-up periods may be important, as our results suggest that the effects after a longer follow-up period are smaller than after a shorter one.

Finally, many psychoanalysts believe that the concept of scientific research (with its measurements, randomization, and strict criteria and procedures) is alien to psychoanalysis. Many would argue that the criteria used in such research—such as the frequency of sessions, the use of a couch, or the presence of particular diagnoses—fail to capture, or even correlate with, the core elements of psychoanalysis. They would see the researcher as an unwanted “third party.” And they would argue that the process of psychoanalysis and the changes in patients cannot be reliably caught in simple, oversimplifying measurement instruments. In this context, it is worth noting that the measurements of personality change in this meta-analysis were mostly done by self-report scales such as the Inventory of Interpersonal Problems, Sense of Coherence Scale, and Social Adjustment Scale. We believe that these outcomes should be subjected to more psychoanalytically relevant personality measurements or factors such as the Adult Attachment Interview, the Minnesota Multiphasic Personality Inventory, projective tests, quality of object relations, and defense styles.

We found evidence that psychoanalysis yields substantial pre/post and pre/follow-up change in patients presenting with complex mental disorders for whom this type of treatment is indicated. These results are almost exclusively based on a small number of pre/post cohort studies, which, from the perspective of evidence-based medicine, are of only moderate scientific strength, as they lack control groups. Therefore, we cannot draw firm conclusions regarding the effectiveness of psychoanalysis. Controlled studies are urgently needed that (1) describe patient samples in both DSM and psychoanalytic diagnostic terms, (2) describe the treatment in more detail, (3) use intention-to-treat analyses, (4) apply in-depth, psychoanalytic personality outcome measures, (5) use long-term follow-up, (6) monitor dropout, (7) ensure treatment integrity, and (8) include cost-effectiveness measures.

Declaration of interest

The authors report no conflicts of interest. The authors alone are responsible for the content and writing of the article.

Appendix 1 Research Quality Score

Back to Top | Article Outline

Appendix 2 Instruments Used in Studies

[1] Symptoms

With regard to measuring symptoms, the following instruments were used: Symptom Check List-90 (Derogatis & Lazarus [1994]),113 Beck Depression Inventory (Beck et al. [1961]),114 State-Trait Anxiety Inventory (Spielberger et al. [1970]),115 (moderated) Goal Attainment Scale (Kiresuk & Lund [1979]),116 Psychischer and sozial-kommunikativer Befund (Rudolf [1991]),117 Health Sickness Rating Scale (Luborsky [1962]),118 Hamilton Depression Rating Scale (Hamilton [1960]),119 Hamilton Anxiety Rating Scale (Hamilton [1959]),120 Global Assessment of Functioning (DSM-IV), Positive Symptom Distress Index (based on the SCL-90), Positive Symptom Total (based on the SCL-90), and Clinical Global Impression–Severity or –Improvement (Guy [1976]).121

[2] Personality and Social Functioning

With regard to measuring changes in personality and psychosocial functioning, the following instruments were used: Inventory of Interpersonal Problems (Horowitz et al. [2000]),122 Minnesota Multiphasic Personality Inventory (Groth-Marnat [1997])123 (using those clinical scales that were, at baseline, clinically elevated relative to a defined cutoff point [Jacobsen et al. (1984, 1999),124,125 Jacobsen & Truax (1991)126], Scales of Psychological Capacities (DeWitt et al., [1991]),127 Shedler–Westen Assessment Procedure–200 (Westen & Shedler [1999]),128,129 Sense of Coherence Scale (Antonovsky [1987]),130 Social Adjustment Scale (Weissman & Bothwell [1976]),131 Work Ability Index (Ilmarinen et al. [1997]),132 work subscale of the Social Adjustment Scale (Weissman and Bothwell [1976]),131 and Perceived Psychological Functioning Scale (Lehtinen et al. [1991]).133 More in-depth measurements of personality change, such as the assessment of attachment styles, defense styles, or object relation-quality, were largely missing or, as in the case of the Knekt study, not yet reported. One study (Berghout/Zevalkink et al. [2006–10, 2012])32–46 used the Adult Attachment Interview (George et al. [1996])134 for assessing outcomes.

