Share this article on:

How Much Tubal Factor Infertility Is Caused by Chlamydia? Estimates Based on Serological Evidence Corrected for Sensitivity and Specificity

Price, Malcolm J. PhD*; Ades, AE PhD*; Welton, Nicky J. PhD*; Macleod, John PhD*; Turner, Katy PhD*; Simms, Ian PhD; Horner, Paddy J. MD*,‡

doi: 10.1097/OLQ.0b013e3182572475
Original Study

Objectives: To estimate the proportion of tubal factor infertility (TFI) caused by Chlamydia trachomatis (CT), the etiologic fraction, from a retrospective study of CT antibody prevalence in TFI cases and controls, adjusted for sensitivity and specificity.

Methods: We use published data on sensitivity and specificity to estimate the performance of assays in (a) women with a previous CT infection without sequelae and (b) women with TFI caused by CT. A model was developed and applied to antibody prevalence in TFI cases and controls from 1 published case-control study to estimate the proportion of TFI caused by CT.

Results: The proportion of TFI episodes that were due to CT infection was estimated to be 45% (credible intervals: 28%, 62%). Models which assume that test sensitivity is higher in women with CT-related TFI than women with previous infection and no sequelae fit the data significantly better than models that assume the same sensitivity in all those previously infected.

Conclusions: Greater attention needs to be paid to methods for characterizing the performance of CT antibody tests. Serological studies could be given a greater role both in CT etiology and in monitoring the effects of prevention and control programmes.

From the *School of Social and Community Medicine, University of Bristol, Bristol, United Kingdom; Health Protection Agency, Colindale, London, United Kingdom; and Bristol Sexual Health Centre, University Hospital Bristol NHS Foundation Trust, Bristol, United Kingdom

The authors thank Dr. Kate Soldan for providing expert advice during the project and commenting on the manuscript.

Conflicts of interest and sources of funding: Supported by a UK Medical Research Council Grant (G0801947).

Correspondence: Malcolm Price, PhD, Canynge Hall, 39 Whatley rd, Bristol, BS8 2PS, United Kingdom. E-mail:

Received for publication October 10, 2011, and accepted January 21, 2012.

Supplemental Digital Content is available for this article. Direct URL citations appear in the printed text, and links to the digital files are provided in the HTML text of this article on the journal's Web site (

The literature on Chlamydia trachomatis (CT) includes large numbers of retrospective studies on the prevalence of serum antibodies to CT in which women with tubal factor infertility (TFI) are compared with controls.13 These studies were undertaken to find evidence for the causal role of CT in TFI, to assess the diagnostic value of serological tests, or occasionally to determine the proportion of TFI that is causally attributable to CT.4 TFI may be caused by a variety of factors other than CT and may occur a considerable time after the initial infection, during which the individual can be treated or clear infection spontaneously. A well-known formula relates the etiologic fraction (EF), to the prevalence of previous exposure to CT (i.e., cumulative incidence) in the population (π) and the odds ratio (OR), the odds of CT in women with TFI divided by the odds in controls, in retrospective studies.5

The use of this formula in serological studies comparing TFI cases with controls faces 3 obstacles. First, CT antibody tests are relatively insensitive and nonspecific measures of previous CT infection, and cross-reactivity with antigens to Chlamydophila pneumoniae was present in earlier assays.68 Even so, it is possible to adjust the observed CT prevalence in cases and controls if reliable estimates of sensitivity and specificity are available.9 Test performance has, however, been poorly characterized. Although sensitivity can be measured in true infections that are identified on the basis of bacterial culture or nucleic acid amplification tests (NAAT), it is difficult to identify a true negative population in which to estimate specificity; those who are culture-negative and/or NAAT-negative cannot be regarded as “never infected,” as they may have cleared a previous infection.10 Morre et al11 carried out a “discrepancy analysis,” but this is known to overestimate test accuracy.12,13 A better strategy is to assess sensitivity in NAAT or culture-positive specimens and specificity in samples from children who would not normally have been exposed to CT.14,15

A second difficulty is that antibody levels, and hence test sensitivity, may differ between TFI cases whose TFI was caused by CT and previously infected controls. Among populations with evidence of previous CT infection, there are consistent reports that higher antibody titres are found among those with TFI,16,17 probably reflecting an inflammatory response to CT, which causes tubal occlusion.18 Similar observations have been made for salpingitis and ectopic pregnancy.19,20

A third difficulty is that the ORs must be adjusted for confounding variables, as the formula cannot otherwise distinguish between a TFI that was caused by a previous CT infection and a TFI caused by another organism in individuals with coincidental past exposure to CT. Indeed, if all TFI was caused by other sexually transmitted infections (STIs), and CT had no causal role, we would still expect a strong positive association between CT and TFI.

These difficulties can make the use of formula (1) to estimate the EF from case-control studies unreliable. Kosseim et al reported 72% antibody prevalence in TFI cases and 22% in controls using a Micro-Immuno-Fluorescence (MIF) assay, giving an OR of 9.2 and an EF in this population of 64%,4 meaning that 64% of all the TFI could be attributed to CT. The high reported prevalence in TFI cases is sometimes attributed to false positives, but it can be shown that sensitivity must always be higher than observed prevalence. If, optimistically, we assumed 85% sensitivity and 95% specificity, the corrected prevalences based on formula (3) (mentioned in Methods) would be 90% and 21.5%, giving an OR of 33.4 and an EF of 87.3%. Lower levels of sensitivity yield ORs and EFs that are even more extreme. One explanation for the results of the study is that CT antibody titre, and hence sensitivity, is, as suggested earlier, higher in the subgroup whose TFI was caused by CT. This forms the basis for the approach we have taken.

We present a new statistical modeling method to estimate the population EF of TFI due to CT. This is applied to a published study of antibody prevalence in TFI cases and controls, which used 3 peptide-based and 2 MIF assays.21 Our reanalysis addresses the difficulties attaching to formula (1); it adjusts for sensitivity and specificity of the assays, it takes account of the fact that women with TFI caused by CT may have especially high antibody levels, and for the fact that women whose TFI is caused by other STIs are more likely to have been exposed to CT than controls.

We begin by estimating sensitivity and specificity of the tests used in the Land et al study, in both cases and controls, and then we apply our modeled estimates to the prevalence data reported by Land et al. Technical details are given in 5 Supplemental Digital Content files (online only; see below for URLs).

Back to Top | Article Outline


Analysis of Test Performance

Test performance is characterized in terms of inherent ability to distinguish true positives and true negatives. We assume that the resolution (R) of each test T is a fixed property of the test, whereas the titre at which a sample is declared “positive” can be moved from low to high, raising sensitivity and lowering specificity, tracing out the receiver operator characteristic curve.22 High resolution implies greater discrimination between true positives and true negatives, whereas the receiver operator characteristic curve defines the sensitivity that is achieved at each false-positive rate. Resolution for test T is defined as follows:

We use data from Wills et al who examined sensitivity and specificity of a pgp3 assay and 3 peptide-based assays: the Medac pELISA, Savyon SeroCT ELISA, and Labsystems ELISA.15 The mean RT for these 4 assays was 4.1 with a standard deviation (SD) 0.8 (Supplemental Digital Content 1, online only, available at: For example, with RT = 4.1, if sensitivity is 65%, then from formula (2) specificity is 97%; if sensitivity is 75%, specificity is 95.3%.

We also hypothesize that women with TFI caused by CT have suffered an inflammatory reaction to CT, which is reflected in higher antibody titres. Based on results from Akande et al,16 we estimate that test sensitivity, and hence resolution, is 1.96 (95% confidence interval [CI], 1.56, 2.37) higher in this group than in women with previous CT but no inflammatory reaction (see Supplemental Digital Content 2, online only, available at: For example, if resolution RT = 4.1 in women who were previously infected with CT but who have not experienced an inflammatory reaction, it would be 4.1 + 1.96 = 6.06 in women with TFI caused by CT. This means that if specificity was 97%, then sensitivity in the latter group would be 93% as opposed to 75% in CT-exposed women with no inflammatory reaction.

Back to Top | Article Outline

Analysis of Retrospective Studies of Antibody Prevalence in TFI Cases and Controls

Using a test T, and with information on its sensitivity in the control group, SeT,Previous, and false-positive rate (FpT), we can recover the true prevalence in the control group, π0, from the observed prevalence pT,Controls, using the following relationship:9

The observed prevalence as measured by test T is thus a weighted average of true positives and false positives, the weight being the true prevalence in the control group, π0.

Extending this rationale to the TFI group, the observed prevalence pT,TFI has 3 components. A proportion π1 have had previous exposure to CT that did not cause the TFI: they have test sensitivity SeT,Previous. A further proportion π2 of the TFI cases were caused by CT, and have sensitivity SeT,TFI. Among the remaining 1 − π1 − π2 patients, the prevalence of antibody is the FpT. Thus:

We expect a higher prevalence of past CT infection in women with TFI not caused by CT than in a comparable control group of fertile women because most TFI is caused by STIs.23 Women whose TFI is caused by other infections are therefore more likely to have been exposed to CT. We therefore estimate the parameters subject to the constraint: π1 / (1 − π2) ≥ π0. We assess the impact of this constraint in sensitivity analyses.

We fit this model to published data from the Land et al study of antibody prevalence in 51 laparascopically verified TFI cases and 264 controls, using 5 different antibody tests at between 2 and 5 different cutoffs.21

We assume that the difference ΔSe in sensitivity between SeT,Previous and SeT,TFI is the same for all tests, but an alternative model is explored in a sensitivity analysis. Because there are so many unknowns, we fit the model with RT, ΔSe, and π0 ranging over a wide “grid” of fixed values as follows:

* For each test T, test resolution RT is assumed to come from a normal distribution with means fixed at 0.5, 1.5, 2.5, 3.5, 4.5, 5.5, and 6.5 and SDs fixed at either 0.5, 0.8, or 1.1. The mean estimated from the Wills study (aforementioned) was 4.1 and the SD 0.8. For a sensitivity of 60%, RT = 0.5 implies 52.4% specificity, whereas RT = 6.5 implies 99.8% specificity.

* ΔSe is fixed at 0, 1, 2, 3, 4, 5, 6, 7, and 8. Our estimate from Akande et al was ΔSe = 1.96. Values >3 or 4 imply virtually 100% sensitivity at typical values of specificity.

* The prevalence of previous CT exposure, i.e., the cumulative incidence, in the control group, π0 is fixed at 25%, 30%, 35%, 40%, and 45%.

All 7 × 3 × 9 × 5 = 945 models were fitted, and the parameter estimates and goodness of fit were recorded. Details of the statistical estimation method are given in Supplemental Digital Content 3, online only, available at:

Back to Top | Article Outline


Figure 1 plots the residual deviance measure of goodness-of-fit (see Supplemental Digital Content 3, online only, available at: for the Land et al study against mean resolution at different values of Δse, with π0 fixed at 0.25, and SD at 0.8. A well-fitting model would have a residual deviance of 34, whereas values >40 suggest a conflict between data and model. Using a residual deviance of 40 as a threshold, the Land et al data set rule out values of RT >3.5 and values of Δse <2, which is also the value that gives the best fit. The poor fit obtained when Δse = 0 or 1 provides strong support for our interpretation of Akande's results.

Figure 2 plots the estimated posterior mean EF (π2) against baseline prevalence of past infection in the control arm (π0) in models with SD = 0.8 and mean residual deviance <40. We further exclude models with a mean resolution <2, as these seem unrealistically low in view of the results from Wills et al (Table 1, Supplemental Digital Content 1, online only, available at:, and models with values of Δse >3.

Rather than choose the “best fitting” model, we take account of uncertainty in model choice by averaging over the models in Figure 2 using the smoothed residual deviance to calculate a weighted parameter estimate.24 This gives a final central estimate of π2, the EF of 0.45 (95% CI, 0.28, 0.62). In Figure 3, the original prevalence in TFI cases and controls observed in the Land et al study is compared with the values predicted by the best-fitting model (resolution = 2.5, Δse = 2, and π0 = 25%, with residual deviance of 34) in which the EF was estimated to be 0.50 (0.37, 0.60).

Back to Top | Article Outline

Sensitivity Analyses

Changing the SD to 0.5 or 1.1 had little material impact on results (not shown). Similarly, increasing π0 to 45% increased residual deviance by around 15% across the graph but gave the same pattern of results.

Three further post hoc sensitivity analyses were examined and are reported in Supplemental Digital Content 4, online only, available at: In 1 sensitivity analysis, we modify the model and assume that the Labsystems and Savyon assays only detect a proportion of CT cases detected by other tests, as they have higher false positive rates than the other assays, but without detecting more cases. This yielded an EF of 0.58 (95% CI, 0.47–0.67).

A second set of sensitivity analysis allows the difference ΔSe to vary between assays, giving EFs between 0.44 and 0.51. A third sensitivity analysis dropped the equation (4) constraint that the prevalence of noncausal CT must be higher in TFI cases than in controls; this raises the EF by between 0 and 0.043.

Back to Top | Article Outline


This article uses literature on contemporary CT serology assays and derives parameters that describe their performance in different groups. We suggest that CT antibody levels, and hence test sensitivity, are systematically different between TFI that was caused by a previous CT infection and a TFI caused by another organism in individuals with coincidental exposure to CT. We then used this model of serology to reinterpret a published study comparing antibody prevalence in TFI cases versus controls and to obtain estimates of the EF, the proportion of TFI that is caused by CT.

We estimate that 45% (28%, 62%) of TFI cases are caused by CT in this population. The credible intervals (Supplemental Digital Content 3, online only, available at: are wide, but different models gave surprisingly consistent estimates with the results being fairly insensitive to reasonable assumptions. Our approach is an alternative to the standard formula based on ORs. It models the EF directly based on assumptions about the distribution of titres in women with TFI and control groups. This avoids the need to control for the positive confounding between TFI and CT that is not causally related. If we apply the standard formula to the Land et al data as it stands, we obtain EFs between 28% (13%, 44%) and 60% (37%, 78%).

We compare our findings with previous research on the role of CT. Using the standard formula, the EF of TFI and ectopic pregnancy due to CT were estimated at 64% and 31%, respectively.4,20 A large case–control study in France estimated the EF from infectious diseases for ectopic pregnancy to be 33% using a logistic regression model to control for measured confounders.25 However, neither study took account of the imperfect sensitivity and specificity of the serological assays. Simms et al report results from a number of studies, which found that approximately 40% of PID cases had current CT infection.26

Although estimates based on retrospective data are regarded as being more vulnerable to bias, only one prospective study that examined pregnancy outcomes following CT infection was identified in an exhaustive literature search.27 This study showed no relationship, but its methods were criticized on several counts.

Our serological results suggest that the performance of contemporary CT antibody assays is less than has been previously reported. The average test resolution of 4.1 derived from the Wills et al study implies 71% sensitivity at 96% specificity. Based on our analysis of the Land et al data set, resolution may be only 3.5 or lower, implying at best 58% sensitivity at 96% specificity. This is low but probably realistic for a population where the majority of antibody-positive women will have cleared their CT infection some time previously. Although sensitivity is usually defined in currently infected individuals, antibody levels are known to decline with time since infection.28 Based on a pgp3 Elisa assay, Wills et al28 found that sensitivity was 89% (75%, 98%) in patients tested within 6 months of infection, but had fallen to 64% (51%, 77%) within a mean of 2.25 years after last known infection, equivalent to a fall in test resolution between current and previous infection of about 1.5 points.

We identified 2 further studies (Supplemental Digital Content 5, online only, available at: of test performance in peptide-based assays and analyzed their resolution (Supplemental Digital Content 1, online only, available at: One11 used a discrepancy analysis to examine the same 3 peptide-based assays and an MIF test. Our analysis suggested an average resolution of 5.9. However, discrepancy analysis is recognized to exaggerate sensitivity at a given specificity.12,13 A second manufacturer-sponsored study of an EIA peptide assay14 yielded an even higher estimate of resolution, 6.29 (95% CI, 4.84, 8.19). The explanation may be that true positives were identified using CT culture, whereas Wills et al used NAAT tests15 that are far more sensitive than culture and are likely to detect less severe infections with lower antibody response.

Our hypothesis that test sensitivity, and hence resolution, is higher in women with sequelae caused by CT than women with past infection was based on independent literature, but was strongly supported by our analysis of the Land et al data set.21 This showed that it was highly unlikely that sensitivity in previous CT and CT-related TFI was the same and was exactly consistent with the 2-point increase in sensitivity in CT-related TFI suggested by the Akande's results.16 As noted in the Introduction, raised antibody levels in women suffering other complications of CT have been widely reported in the literature.

An important implication of our study is that most published estimates of CT antibody test performance lack face validity because published test performance depends wholly on the selection of samples; current versus previous infection, use of NAAT versus culture to define true positives, inclusion of women with an inflammatory response to CT, and methods to identify true negatives will all impact on the estimates of antibody sensitivity and specificity that emerge. Although assays with superior performance are needed, improved methods for characterizing test performance are perhaps of still greater importance.

The major limitation of our study was its reliance on assumptions about parameter values, which, although plausible, and in some cases supported by literature, clearly require further study. Our estimate of the EF (0.45, 95% CI, 28%–62%) has wide credible limits, as it accounts for some of the model uncertainty. Even so, these may not reflect all sources of uncertainty, as we suspect that other assumptions and alternative models could be found for the same data.

Nevertheless, the study suggests that direct modeling of the distributions of titres in TFI or EP cases and well-chosen controls could be given a greater role in etiologic studies of the sequelae of CT and also in monitoring attempts at prevention and control of CT. Johnson et al recently commented on how serology could be used to improve our ability to assess the public health impact of current efforts to reduce the population burden of genital CT, by monitoring changes in age-specific seroprevalence over time.7 Our work indicates that serology might also be used to obtain better estimates of the burden of chronic disease due to CT and improve our understanding of the natural history of infection, both important goals of CT research.29,30

Back to Top | Article Outline


1. Cates W, Wasserheit JN. Genital chlamydial infections: Epidemiology and reproductive sequelae. Am J Obstet Gynecol 1991; 164:1771–1781.
2. Mol BW, Dijkman B, Wertheim P, et al.. The accuracy of serum chlamydial antibodies in the diagnosis of tubal pathology: A meta-analysis. Fertil Steril 1997; 67:1031–1037.
3. Persson K. The role of serology, antibiotic susceptibility testing and serovar determination in genital chlamydial infections. Best Pract Res Clin Obstet Gynaecol 2002; 16:801–814.
4. Kosseim M, Brunham RC. Fallopian tube obstruction as a sequela to Chlamydia trachomatis infection. European J Clin Microbiol 1986; 5:584–590.
5. Rothman K, Greenland S. Modern Epidemiology, 2nd ed. Philadelphia, PA: Lippincott, Williams and Wilkins, 1998.
6. Ward M. Do we need serodiagnosis? Available at: Accessed September 21, 2011.
7. Johnson AM, Horner P. A new role for Chlamydia trachomatis serology? Sex Transm Infect 2008; 84:79–80.
8. Persson K. The role of serology, antibiotic susceptibility testing and serovar determination in genital chlamydial infections. Best Pract Res Clin Obstet Gynaecol 2002; 16:801–814.
9. Sweeting MJ, De Angelis D, Hickman D, et al.. Estimating HCV prevalence in England and Wales by synthesizing evidence from multiple data sources: assessing data conflict and model fit. Biostatistics 2008; 9:715–734.
10. Molano M, Meijer CJ, Weiderpass E, et al.. The natural course of Chlamydia trachomatis infection in asymptomatic Colombian women: A 5-year follow-up study. J Infect Dis 2005; 191:907–916.
11. Morre SA, Munk C, Persson K, et al.. Comparison of three commercially available peptide-based immunoglobulin G (IgG) and IgA assays to microimmunofluorescence assay for detection of Chlamydia trachomatis antibodies. J Clin Microbiol 2002; 40:584–587.
12. Hadgu A. Discrepant analysis: A biased and an unscientific method for estimating test sensitivity and specificity. J Clin Epidemiol 1999; 52:1231–1237.
13. Miller WC. Bias in discrepant analysis: When two wrongs don't make a right. J Clin Epidemiol 1998; 51:219–231.
14. Narvanen A, Puolakkainen M, Hao W, et al.. Detection of antibodies to Chlamydia trachomatis with peptide-based species-specific enzyme immunoassay. Infect Dis Obstet Gynecol 1997; 5:349–354.
15. Wills GS, Horner PJ, Reynolds R, et al.. Pgp3 antibody enzyme-linked immunosorbent assay, a sensitive and specific assay for seroepidemiological analysis of Chlamydia trachomatis infection. Clin Vaccine Immunol 2009; 16:835–843.
16. Akande VA, Hunt LP, Cahill DJ, et al.. Tubal damage in infertile women: Prediction using chlamydia serology. Hum Reprod 2003; 18:1841–1847.
17. Conway D, Glazener CM, Caul EO, et al.. Chlamydial serology in fertile and infertile women. Lancet 1984; 1:191–193.
18. Darville T, Hiltke T. Pathogenesis of genital tract disease due to Chlamydia trachomatis. J Infect Dis 2010; 201(suppl 2):S114–S125.
19. Taylor-Robinson D, Stacey CM, Jensen JS, et al.. Further observations, mainly serological, on a cohort of women with or without pelvic inflammatory disease. Int J STD AIDS 2009; 20:712–718.
20. Kihlstrom E, Lindgren R, Ryden G. Antibodies to Chlamydia trachomatis in women with infertility, pelvic inflammatory disease and ectopic pregnancy. Eur J Obstet Gynecol Reprod Biol 1990; 35:199–204.
21. Land JA, Gijsen AP, Kessels AG, et al.. Performance of five serological chlamydia antibody tests in subfertile women. Hum Reprod 2003; 18:2621–2627.
22. Green DM, Swets JA. Signal Detection Theory and Psychophysics. New York, NY: Wiley, 1966.
23. Hull MG, Glazener CM, Kelly NJ, et al.. Population study of causes, treatment, and outcome of infertility. BMJ 1985; 291:1693–1697.
24. Claeskens G, Hjort N. Model Selection and Model Averaging. Cambridge University Press, 2011.
25. Bouyer J, Coste J, Shojaei T, et al.. Risk factors for ectopic pregnancy: A comprehensive analysis based on a large case-control, population-based study in France. Am J Epidemiol 2011; 157:185–194.
26. Simms I, Stephenson JM. Pelvic inflammatory disease epidemiology: What do we know and what do we need to know? Sex Transm Infect 2000; 76:80–87.
27. Wallace LA, Scouler A, Hart G. What is the excess risk of infertility in women after genital chlamydia infection? A systematic review of the evidence. Sex Transm Infect 2008; 84:171–175.
28. Wills G, Reynolds G, Johnson A. Can Chlamydia trachomatis antibody be used to measure changes in cumulative age specific exposure over time in women? Chlamydial Infections: Proceedings of the Twelfth International Symposium on Human Chlamydial Infections, 2010:453–456.
29. Gottlieb SL, Martin DH, Xu FJ, et al.. Summary: The natural history and immunobiology of Chlamydia trachomatis genital infection and implications for chlamydia control. J Infect Dis 2010; 202(suppl 2):S190–S204.
30. Haggerty C, Gottlieb S, Taylor B, et al.. Risk of sequelae after Chlamydia trachomatis genital infection in women. J Infect Dis 2010; 201(suppl 2):134–155.

Supplemental Digital Content

Back to Top | Article Outline
© Copyright 2012 American Sexually Transmitted Diseases Association