CHLAMYDIA TRACHOMATIS (CT) INFECTION, the most prevalent bacterial sexually transmitted infection in developed countries, can progress to pelvic inflammatory disease (PID) in women and epididymitis in men and may lead to sequelae such as ectopic pregnancy, infertility, and chronic pelvic pain.1 Active case finding and early treatment are crucial strategies to reduce transmission and consequences of infection. Systematic screening of women has been shown to reduce the incidence of PID and ectopic pregnancy.2,3 Improved detection methods of CT in urine allow for community-based testing in both sexes, including home-based testing4–9 and tailored community outreach testing.10
Universal screening is not likely to be cost-effective in a population with relatively low CT prevalence. Selective screening, based on risk assessment, may improve the cost-effectiveness and confronts fewer individuals with an unnecessary test. However, it could lead to a substantial proportion of missed infections (low sensitivity).
We previously developed a prediction rule for CT infection, which performed satisfactorily at internal validation.11 The prediction rule was regarded as a promising tool for selective CT screening at a population level and to guide individuals in their choice of participation. Independent validation on other data is, however, essential before using the prediction rule in practice. When tested on data that were used for model development, the apparent performance may be excellent, but the performance in other populations may be considerably poorer.12 Selective CT screening criteria for both sexes showed poor performance when applied to another population in earlier studies13–16 and have not led to practical guidelines for selection of high-risk individuals.14,17
We aimed to assess the validity of the previously developed prediction rule in participants of a systematic chlamydia screening project in Amsterdam14,18 and in a community outreach CT screening project in Rotterdam.19
The CT pilot was a large population-based chlamydia screening project in 2002–2003 covering rural and urban areas in The Netherlands. Demographic, behavioral, clinical, and geographic risk factors in 6303 15- to 29-year-old sexually active women and men were measured by a self-administered questionnaire. These data were used to develop a prediction rule for the probability of CT infection among participants.11 The rule included as predictors age (15–19 years), area address density (urban), ethnicity (Surinamese/Antillean), education (low/intermediate), and urogenital symptoms in the previous 4 weeks (women—[post]coital bleeding; men—frequent urination), lifetime sexual partners, a new sexual partner in the last 2 months, and no condom use at last sexual contact. Predictors were assigned scores based on logistic regression coefficients (Table 1).
A population-based screening project was organized by general practitioners in Amsterdam in 1996–1997 (“Amsterdam study”).14,18 Men and women aged 15 to 40 years were invited to fill in a questionnaire and collect urine at home. Samples were tested by ligase chain reaction (LCX). Participation rate among women was 51% and 33% among men. The overall prevalence of CT infection among the participating women and men was 2.8% and 2.4%, respectively. Selective screening criteria were developed for 75% of the sexually active participants. Determinants for CT infection in women were Surinamese/Antillean origin, being unmarried and not cohabiting, and a partner change in the last 2 months. For men, Surinamese/Antillean origin and painful micturition were predictive for chlamydial infection.14 For women aged 15 to 40 years, the discriminative ability in the development group had been calculated as an area under the receiver operating characteristic curve (AUC) of 0.67 (95% confidence interval [CI], 0.65–0.69), which decreased to 0.58 (95% CI, 0.54–0.61) at validation in a random part of the Amsterdam population. For men in this age category, the AUC in the development group had been 0.59 (95% CI, 0.55–0.60), which decreased to 0.53 (95% CI, 0.48–0.57) at validation.14
The questionnaire of the CT pilot had been adapted from the questionnaire of the Amsterdam study. Therefore, most variables were the same in both studies. Only one predictor variable was modified: In the CT pilot, condom use at last sexual contact was inquired, whereas in the Amsterdam study, the question was phrased: do you and your partner use a condom always, sometimes, or never? The answer “always use condom” was substituted by “condom use at last sexual contact—yes”; the remaining answers were coded as no condom use.
In a community outreach testing project in Rotterdam in 2004, youths aged 15 to 29 years and expected to be at high risk (low education, Surinamese/Antillean ethnicity) were offered urine testing for chlamydia during sexually transmitted disease prevention activities. Participants filled in a written questionnaire. In this project, urine samples were tested by polymerase chain reaction (PCR) Amplicor and by a test with higher sensitivity.19 For this validation study, the CT results of the PCR Amplicor were used because this was the test used in the CT pilot. Test rate among sexually active men was 28% (73 of 260) and 53% (99 of 187) among women. CT prevalence was 11% (19 of 172). For 152 participants, including all 19 CT-infected cases (13%), the score could be calculated (Table 1).
Performance of Models
We calculated the predicted probability for CT infection for participants aged 15 to 29 years according to the predictor score. Mean and median risk scores were calculated for each population. The performance of the models was assessed about discrimination and calibration.20,21 The prediction rule’s ability to discriminate between participants with or without a chlamydial infection was quantified by using the AUC.22 A model with an AUC of 0.5 has no discriminative power, whereas an AUC of 1 reflects perfect discrimination. Ninety-five percent confidence intervals around these AUCs were calculated. Calibration is the ability of a model to produce unbiased estimates of the probability of outcome, e.g., if participants with certain characteristics are predicted to have a 10% risk for CT infection, the actually observed prevalence should also be 10%. Calibration was assessed graphically by plotting observed frequencies of chlamydial infection against predicted probabilities. Calibration was further tested with the Hosmer-Lemeshow goodness-of-fit test, which assesses agreement between predicted and observed risks. The ratio of the means of the predicted and actual CT prevalence was calculated.
Imputation of missing values based on correlations between predictors was performed as a secondary analysis.23,24 We compared the regression coefficients between the development and validation studies to explain differences in performance. Hereto, logistic regression models were used for each validation study that included the log odds of the predicted outcome (also known as linear predictor) as an offset variable and CT infection as the dependent variable.25 The predictors were added to this model one at a time to test the significance of the deviation of the predictive effect in the validation study from the development study.25
Characteristics of Validation Populations
From the Amsterdam study, data of 1788 sexually active participants aged 15 to 29 years were available and the score could be calculated for 1413. Among these, 52 were CT-infected (3.7%). As a result of missing data, the score could not be calculated for 21%, in which the CT prevalence was 3.2% (12 of 375, P = 0.65).
In the Amsterdam study, 8.4% of the population was under 20 years compared with 23% in the CT pilot and 58% in the Rotterdam study (Table 1). Frequency of Surinamese/Antillean ethnicity and intermediate/low education was high in the Rotterdam study population and lowest in the CT pilot study. Recent partner change and condom use at last sexual contact was most frequent in the Rotterdam study and comparable in the two population-based studies.
The prediction rule had an AUC of 0.79 (95% CI, 0.76–0.84) in the CT pilot study.11 Validation showed a lower AUC in both the Amsterdam study (0.66; 95% CI, 0.58–0.74) and the Rotterdam study (0.68; 95% CI, 0.58–0.79). The mean risk score of both external studies were higher than in the CT pilot. The standard deviations were smaller, indicating less spread in predictions (Table 2). This reflects the higher homogeneity in risk factors; for example, all participants in the validation studies were living in a very high urban area (Table 1).
Figure 1A shows the calibration plot for the CT pilot, in which the prediction rule was developed. The calibration was good, especially for CT prevalence under 10% (Hosmer-Lemeshow goodness-of-fit test P = 0.51). The predictions for the Amsterdam study were systematically too high, with a ratio between mean predicted and actual prevalence of 1.27. The P value of 0.02 for the Hosmer-Lemeshow goodness-of-fit test also indicated poor calibration (Fig. 1B). Calibration for the Rotterdam study was acceptable (P = 0.20). The predicted prevalence was generally lower than the actual prevalence; for example, a predicted prevalence of 5% according to the score was actually around 10%. The ratio between mean of predicted and actual prevalence was 0.71 (Fig. 1C).
The effects of the predictors were generally similar across the three studies. However, low/intermediate education had a significantly lower effect in the Amsterdam population compared with the CT pilot (P = 0.02). Remarkably, having a new partner in the previous 2 months seemed to have a protective effect in the Rotterdam study (P = 0.04). Repeated analyses with imputation of missing values in the Amsterdam population led to similar results (data not shown).
Application of the Prediction Rule for Screening
The prediction rule might be used for selective screening of individuals with relatively high scores (Table 3). The first row gives the scenario of screening all participants (sensitivity 100%). Screening all sexually active participants in the CT pilot with a sum score ≥7 would reduce the fraction to be screened to 45%. However, 13% of the cases would then be missed (sensitivity 87%). The expected prevalence in the screened group would be 4.5% in contrast to 2.3% on average. Using the same cutoff score in the Amsterdam study would detect 94% of the cases by screening 77% of the population with an expected prevalence of 4.5%. In the high-risk population of the Rotterdam study, 94% would then be advised to be screened, no cases would be missed, and the expected prevalence would be 13%. Figure 2 depicts the fraction of detected CT cases (sensitivity) in relation to the fraction to be screened for various cutoff points of the sum score.
To reach a sensitivity of 93% in the CT pilot, 62% of the population would have to be screened and the expected prevalence would be 3.5%. To reach the same sensitivity in the Amsterdam study, 77% of the population would have to be screened and a higher prevalence would be found (4.5%). In the Rotterdam study, no cases would be missed when screening 75% of the population and the expected prevalence would be 16.5%.
This study showed a reasonable discriminative ability of a chlamydia prediction rule in a population-based CT screening study from Amsterdam (n = 1413, AUC 0.66) and in a small study among high-risk groups in Rotterdam (n = 152, AUC 0.68). The observed CT prevalence was lower than predicted in the Amsterdam study but higher in the Rotterdam study. When comparing the population-based studies, a higher fraction had to be screened in the Amsterdam population compared with the CT pilot to reach the same sensitivity. The Rotterdam study selected high-risk youths, and efficiency of screening would hence be higher than in the Amsterdam study.
Prediction rules for CT infection have developed before, but we used more advanced statistical methods in a relatively large population and report here on a thorough external validation. Development and validation of the score were done in Dutch populations. Predictions of the score can probably not directly be used in other countries without further validation. Most predictors included in our prediction rule are, however, well-known from previous research.13,16,26–31 Because all are readily available from questionnaire data or interviews, practical application is possible.
The Amsterdam screening study was carried out in the general population, similar to the CT pilot. As a result of resembling questionnaires used in the two validation studies, calculation of the predictor score was possible for both studies, a prerequisite for external validation. Despite the similar questionnaire, we had a considerable amount of missing values in the Amsterdam study, but this did not to affect the validation results. Because the percentage missing score in the CT pilot was low, there is no indication that participants would be unwilling to provide information needed for the decision to be screened.
Although direct comparison is not possible because of different age ranges, it seems that the performance of the predictor score was somewhat better than that of sex-specific models that were previously developed from the Amsterdam study.14 This may be explained by the inclusion of a different and more extensive set of predictors in our prediction rule.
In a reevaluation of screening criteria among women in the U.S. Pacific Northwest, a benchmark for selective screening was stated as 90% sensitivity by screening 60% of the population.29 This benchmark was reached in our development population (52% to be screened), almost in the Rotterdam study (62%), but not in the Amsterdam study (72%, Fig. 2).
External validation of prognostic models is essential to assess generalizability and for fair comparisons of alternative models. Models often perform less at external validation12,20 and this can have several reasons. The predictor score was developed in a multicenter study with a rather large sample size and advanced statistical approaches, e.g., bootstrap validation techniques were used.21 As a rule of thumb, the number of predictors should be less that one tenth of the number of events.23 In the CT pilot, the ratio of predictors to CT cases was 8/144 = 1/18. Therefore, overfitting was unlikely, and generalizability and calibration of predictions in new screenees was expected to be good.23,32,33
In the Amsterdam study, the discriminative ability was reasonable but lower than we had expected. Use of models in other populations requires similar prognostic relationships in these populations. We assessed differences in effects of variables between the CT pilot and the validation studies, but these comparisons were limited by the small size of the latter studies.
Although the CT pilot and Amsterdam study population were similar in design, the Amsterdam population was less heterogeneous, especially because all participants were living in a very high urban area (higher mean score). Furthermore, having a low/intermediate education equals 2 score points in the prediction rule, but this predictor had no predictive effect in the Amsterdam participants. Hence, the score—and the predicted risk of CT infection—were too high.
The Rotterdam study was targeted at high-risk youths within a highly urbanized area and consequently this population was rather homogenous. This may partially explain the relatively low AUC of 0.68. Another reason may be that the study was too small to perform a reliable external evaluation, as is apparent from the wide confidence intervals of the predicted prevalences.34,35
A low performance of a predictive model may be the result of missing determinants of infection, which are unequally spread in the respective populations. We have included well-known risk factors in our score that are easily measured by a questionnaire. We did, however, not inquire about concurrent partnerships, and we do not have data on age or ethnicity of the partner. Partner characteristics are obviously a component of the sexual network persons participate in. This is exemplified by the fact that some of the young infected women were with their first partner and that those partners were members of high-risk ethnic groups.19 Those missing network determinants may form part of the explanation for the suboptimal performance of the score in the Rotterdam study. This indicates that the prediction rule has to be validated and adapted further. For example, when using the predictor score questions in a CT screening program, partner characteristics should preferably be included and performance be evaluated.
An individual may mainly be interested in his or her own risk and the need to get tested. Agreement of the predicted risk with the actual risk is then important (calibration). When participants from the Rotterdam study with a certain score and a corresponding predicted prevalence are advised to get tested, the actual prevalence would even be higher. This does not undo the advantages of the prediction rule. When screening persons with a score ≥7 in the Amsterdam study, this corresponds with a predicted prevalence of 4.5%. The actual prevalence would, however, only be 3%, which is a critical point in the use of the score.
A score of ≥7 as a cutoff point would detect 87% of the cases in the CT pilot by screening 45%; the optimal threshold of the benchmark of 90% sensitivity by screening 60% would be reached.29 In the Amsterdam study, this cutoff would imply testing 77% to find 94% of the cases. This implies a reduction of persons to be screened of 23% compared with screening the whole population, and we consider this reduction to be sufficient to support use of the prediction rule. Finally, in the Rotterdam study, no cases would be missed but one would have to screen 94% of those high-risk persons. With an expected prevalence in the screened population above 13%, there is no reason to oppose this. The Rotterdam study was approaching high-risk youth in its design; consequently, it was inherent to screen a large fraction. We therefore believe that the score could be used to motivate individuals for testing despite statistical less-than-optimal results.
In conclusion, these findings support the use of the prediction rule as a tool for selective CT screening. However, when a high sensitivity is required, only a limited fraction of participants can be excluded from screening.
1. Stamm W. Chlamydia trachomatis
. In: Sexually Transmitted Diseases. New York: McGraw-Hill, 1999.
2. Egger M, Low N, Smith GD, Lindblom B, Herrmann B. Screening for chlamydial infections and the risk of ectopic pregnancy in a county in Sweden: Ecological analysis. BMJ 1998; 316:1776–1780.
3. Scholes D, Stergachis A, Heidrich FE, Andrilla H, Holmes KK, Stamm WE. Prevention of pelvic inflammatory disease by screening for cervical chlamydial infection. N Engl J Med 1996; 334:1362–1366.
4. van Bergen JEAM, Götz HM, Richardus JH, Hoebe CJ, Broer J, Coenen AJ. Prevalence of urogenital Chlamydia trachomatis
increases significantly with level of urbanisation and suggests targeted screening approaches: Results from the first national population based study in The Netherlands. Sex Transm Infect 2005; 81:17–23.
5. Bloomfield PJ, Kent C, Campbell D, Hanbrook L, Klausner JD. Community-based chlamydia and gonorrhea screening through the United States mail, San Francisco. Sex Transm Dis 2002; 29:294–397.
6. Andersen B, Olesen F, Moller JK, Ostergaard L. Population-based strategies for outreach screening of urogenital Chlamydia trachomatis
infections: A randomized, controlled trial. J Infect Dis 2002; 185:252–258.
7. Turner CF, Rogers SM, Miller HG, et al. Untreated gonococcal and chlamydial infection in a probability sample of adults. JAMA 2002; 287:726–733.
8. Macleod J, Salisbury C, Low N, et al. Coverage and uptake of systematic postal screening for genital Chlamydia trachomatis
and prevalence of infection in the United Kingdom general population: Cross sectional study. BMJ 2005; 330:940.
9. Low N, McCarthy A, Macleod J, et al. The chlamydia screening studies: Rationale and design. Sex Transm Infect 2004; 80:342–348.
10. Ford CA, Viadro CI, Miller WC. Testing for chlamydial and gonorrheal infections outside of clinic settings: A summary of the literature. Sex Transm Dis 2004; 31:38–51.
11. Götz HM, van Bergen JE, Veldhuijzen IK, Broer J, Hoebe CJ, Richardus JH. A prediction rule for selective screening of Chlamydia trachomatis
infection. Sex Transm Infect 2005; 81:24–30.
12. Altman DG, Royston P. What do we mean by validating a prognostic model? Stat Med 2000; 19:453–473.
13. van Valkengoed IG, Boeke AJ, Morre SA, et al. Disappointing performance of literature-derived selective screening criteria for asymptomatic Chlamydia trachomatis
infection in an inner-city population. Sex Transm Dis 2000; 27:504–507.
14. van Valkengoed IG, Morre SA, van den Brule AJ, et al. Low diagnostic accuracy of selective screening criteria for asymptomatic Chlamydia trachomatis
infections in the general population. Sex Transm Infect 2000; 76:375–380.
15. Verhoeven V, Avonts D, Van Royen P, Weyler J, Wang X, Stalpaert M. Performance of a screening algorithm for chlamydial infection in 2 samples of patients in general practice. Scand J Infect Dis 2004; 36:873–875.
16. Marrazzo JM, Fine D, Celum CL, DeLisle S, Handsfield HH. Selective screening for chlamydial infection in women: A comparison of three sets of criteria. Fam Plann Perspect 1997; 29:158–162.
17. Andersen B, van Valkengoed I, Olesen F, Moller JK, Ostergaard L. Value of self-reportable screening criteria to identify asymptomatic individuals in the general population for urogential Chlamydia trachomatis
infection screening. Clin Infect Dis 2003; 36:837–844.
18. van Valkengoed IG, Boeke AJ, van den Brule AJ, et al. [Systematic home screening for Chlamydia trachomatis
infections of asymptomatic men and women in family practice by means of mail-in urine samples.] Ned Tijdschr Geneeskd 1999; 143:672–676.
19. Götz HM, Veldhuijzen IK, Ossewaarde JM, de Zwart O, Richardus JH. Outreach based chlamydia screening in multi-ethnic urban youth: a pilot combining STD health education and testing in Rotterdam (Netherlands). Sex Transm Infect In press.
20. Justice AC, Covinsky KE, Berlin JA. Assessing the generalizability of prognostic information. Ann Intern Med 1999; 130:515–524.
21. Steyerberg EW, Eijkemans MJ, Harrell FE Jr, Habbema JD. Prognostic modelling with logistic regression analysis: A comparison of selection and estimation methods in small data sets. Stat Med 2000; 19:1059–1079.
22. Hosmer D, Lemeshow S. Applied Logistic Regression. New York: John Wiley, 1999.
23. Harrell FE Jr, Lee KL, Mark DB. Multivariable prognostic models: Issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med 1996; 15:361–387.
24. Clark TG, Altman DG. Developing a prognostic model in the presence of missing data: An ovarian cancer case study. J Clin Epidemiol 2003; 56:28–37.
25. Steyerberg EW, Borsboom GJ, van Houwelingen HC, Eijkemans MJ, Habbema JD. Validation and updating of predictive logistic regression models: A study on sample size and shrinkage. Stat Med 2004; 23:2567–2586.
26. Fenton KA, Mercer CH, McManus S, et al. Ethnic variations in sexual behaviour in Great Britain and risk of sexually transmitted infections: A probability survey. Lancet 2005; 365:1246–1255.
27. Low N, Sterne JA, Barlow D. Inequalities in rates of gonorrhoea and chlamydia between black ethnic groups in south east London: Cross sectional study. Sex Transm Infect 2001; 77:15–20.
28. LaMontagne DS, Fenton KA, Randall S, Anderson S, Carter P. Establishing the National Chlamydia Screening Programme in England: Results from the first full year of screening. Sex Transm Infect 2004; 80:335–341.
29. LaMontagne DS, Patrick LE, Fine DN, Marrazzo JM. Re-evaluating selective screening criteria for chlamydial infection among women in the US Pacific Northwest. Sex Transm Dis 2004; 31:283–289.
30. Paukku M, Kilpikari R, Puolakkainen M, Oksanen H, Apter D, Paavonen J. Criteria for selective screening for Chlamydia trachomatis
. Sex Transm Dis 2003; 30:120–123.
31. Verhoeven V, Avonts D, Meheus A, et al. Chlamydial infection: An accurate model for opportunistic screening in general practice. Sex Transm Infect 2003; 79:313–317.
32. Van Houwelingen JC, Le Cessie S. Predictive value of statistical models. Stat Med 1990; 9:1303–1325.
33. Harrell F. Regression coefficients and scoring rules. J Clin Epidemiol 1996; 49:819.
34. Vergouwe Y, Steyerberg EW, Eijkemans MJ, Habbema JD. Substantial effective sample sizes were required for external validation studies of predictive logistic regression models. J Clin Epidemiol 2005; 58:475–483.
35. Steyerberg EW, Bleeker SE, Moll HA, Grobbee DE, Moons KG. Internal and external validation of predictive models: A simulation study of bias and precision in small samples. J Clin Epidemiol 2003; 56:441–447.