Secondary Logo

Journal Logo

Original Study

Deriving and Validating A Risk Estimation Tool for Screening Asymptomatic Chlamydia and Gonorrhea

Falasinnu, Titilola MHS*; Gilbert, Mark MD, MSc; Gustafson, Paul PhD; Shoveller, Jean PhD§

Author Information
doi: 10.1097/OLQ.0000000000000205
  • Open



Sexual health care is suffering from a budgetary crisis as health care costs escalate in almost every high-income country, and the recent economic crisis exacerbates the financial challenges already facing publicly funded health care programs.1 Sexual health clinics in several jurisdictions are experiencing budget cuts, hiring freezes, and, in some cases, closures; surviving clinics have had to limit operating hours, cut some services, and streamline operations.1 In this context, improving sexual health care delivery is arguably of paramount interest. Moreover, inadequate access to sexual health care has potentially detrimental effects on individuals and also the community.2 Thus, it is imperative to adopt systems that optimize the delivery of comprehensive and high quality sexually transmitted infections (STI) testing services while minimizing public health budget demands.3 For example, several jurisdictions have adopted alternative service delivery modules, such as Internet-based testing and triaging services that involve sexual health clients having limited interaction with clinicians.4,5 In British Columbia (BC), we are developing Get Checked Online (GCO), an Internet-based STI testing service intended to supplement existing face-to-face clinic-based sexual health services with the goal of increasing testing uptake and frequency and to ease demand on clinic-based testing.4,6

One important feature of efficient STI control programs, especially in novel health service delivery models such as GCO, is making certain that those at increased risk for STIs have access to screening services because identifying and treating infections in this group can effectively terminate onward transmission and therefore prevent new cases of disease. However, much is still unknown about how best to maximize access for those at highest risk, particularly in contexts where STI clinics are overburdened with symptomatic clients.2 Also, identifying high-risk asymptomatic clients may also confer public health benefits as undiagnosed and untreated infections can frequently progress to complications such as pelvic inflammatory disease and infertility (in women) and epididymitis (in men).7 To maximize case finding in asymptomatic visits, selective screening (based on risk assessment and entails the screening of individuals who meet prespecified criteria) may be a prudent approach because it minimizes the costs associated with testing low-risk individuals.8

The current body of knowledge indicates that risk prediction rules that capture a continuous risk spectrum are excellent tools for targeted screening.9 A recent review critically appraising prediction rules used for sexual health service provision found that these have been explored in a variety of contexts including emergency departments, population-based settings and STI clinics.10 However, there are currently no published prediction rules for screening asymptomatic chlamydia and/or gonorrhea. The current understanding indicates that increasing testing access for high-risk asymptomatic individuals may result in long-term health and health economic impacts that surpass those associated with clinically observable infections.7

Here, we examine the performance of a selective screening strategy (derived from a clinical prediction rule) in identifying persons at increased risk for Chlamydia trachomatis or gonococcal infections. Specifically, this study aimed to derive a risk scoring tool for screening for asymptomatic chlamydia and/or gonorrhea infection among patients seen at STI clinics between 2000 and 2006 (derivation population) and test the generalizability of the algorithm in a more recent period among patients seen between 2007 and 2012 (temporal validation population).


Study Population

We used electronic medical records from asymptomatic patients tested for chlamydia and/or gonorrhea between 2000 and 2012 at 2 STI clinics in Vancouver, BC. All statistical analyses were performed using SAS version 9.3. We have previously described the study setting, population, and variables examined.6 Briefly, the data were derived from the STI Information System (STI IS), a database that houses risk assessment information and laboratory results of patient visits at publicly funded STI clinics in BC. Data from each new client consultation between 2000 and 2012 among women and heterosexual men were included. This analysis was limited to asymptomatic clinic visits that are not sexual contacts of known STI cases as ascertained by the attending clinician during risk assessment. Repeat visits within 30 days of a previous clinical visit were also excluded to avoid including clients receiving confirmatory diagnoses. Chlamydia and/or gonorrhea infection was measured as a composite outcome because most laboratories use multiplex assays that test for both infections simultaneously11 and also because the relevant clinical decision is whether to offer this test or not. At these clinics, chlamydia, gonorrhea, syphilis, and HIV tests are generally offered to all sexually active clients at each visit.

For this analysis, we extracted a range of demographic and behavioral information from patient visits such as age, sex, race/ethnicity, number of sexual partners in the previous 6 months, condom use, injection drug use (IDU), sex with partners recruited online, sex with IDU, sex with commercial sex workers (CSWs), previous diagnosis with chlamydia, and previous diagnosis with gonorrhea. Missing data among predictors were imputed 5 times using the Sequential Regression Imputation Method, and this resulted in estimates that were averaged using Rubin’s rules.12–17

It is important to acknowledge that the scope of the current study was limited to the examination of chlamydia and/or gonorrhea outcomes among asymptomatic women and heterosexual men for several reasons. First, this article focused on this population for pragmatic reasons because sample size restrictions prohibited the examination of other outcomes such as HIV and syphilis, or other populations of interest such as men who have sex with men. Second, the targeted population accessing GCO will be limited to those who are asymptomatic, as clients presenting with symptoms will automatically be referred to an STI clinic.

Derivation of Risk Estimation Tool

The outcome measured was diagnosis with chlamydia and/or gonorrhea infection. We used χ2 tests to analyze categorical variables and used Student t test to analyze continuous variables. Univariate logistic regression was used in the derivation population to identify potential STI risk factors. To simplify risk score generation and facilitate application in clinical and population-based settings, we categorized continuous variables (e.g., age and number of sexual partners). We tested for interaction between sex and other risk factors. Predictors found to be significant in the univariate analyses were included in the final logistic regression model using backward elimination; predictors that remained in the model had P values less than 0.20. To be conservative, the final regression model included only variables with P < 0.05 in at least one of the imputed data sets.18 The risk factors in the final model were used to construct the equation used for the clinical prediction rule. To aid use in screening decision making, we derived simplified risk scores by multiplying the regression coefficients (β values) by 5 and rounding them to the nearest integers. Sum scores for each visit were then derived by adding up the risk scores. These sum scores are a direct reflection of the probability of infection.19

Performance Measures

We estimated the model’s ability to discriminate between participants with or without infection as measured by the area under the receiver operating characteristic curve (AUC). An AUC value closer to a 100% shows that the model has excellent discriminative ability, whereas a value close to 50% indicates no value.20,21 We performed 10-fold cross-validation techniques to estimate how the model will generalize to an independent population and correct for this optimism bias.22 We assessed calibration performance by calculating the Hosmer-Lemeshow goodness-of-fit statistic, which measures whether the predicted probability of infection corresponds with the observed probability. A well-calibrated model gives a corresponding P value greater than 0.05.23 We also studied the calibration of the simplified risk scores by visually examining the prevalence of chlamydia and/or gonorrhea infection in groups of the risk scores.24 We also examined the sensitivity (i.e., proportion of all cases identified) and fraction of patients who would need to be screened at different cutoffs of the risk scores. The benchmark set for a well-performing tool is one that identifies more than 90% of cases while screening 60% or less of the population.25


Derivation Population

Figure 1 is a flowchart showing the selection of clinic visits whose data were used in this study. The chlamydia and/or gonorrhea infection prevalence was 1.8% in the derivation population (n = 10,437). Table 1 shows the baseline distribution of candidate predictors. The following were the demographic characteristics of the majority of patient visits during the study period: male (67%), individuals between 30 and 39 years old (31%), and white race (74%). Individuals who reported consistent condom use during sexual contact comprised approximately 27% of clinic visits. Sexual contact with a CSW was documented in 13% of patient visits (Table 1).

Study population selection.
Population Characteristics of Derivation and Temporal Validation Populations*

Univariate predictors of chlamydia and/or gonorrhea infection are shown in Table 2. The following predictors were not significantly associated with infection and were subsequently not included in the final logistic regression model: sex, condom use, sex with partners recruited online, IDU, sex with IDU, and sex with CSW. We found no significant differences between the risk factors and the outcome by sex. Table 3 shows the results of the final multivariable regression model used for developing the prediction rule. The model included age in years (categorized as 14–19, 20–14, 25–29, 30–39, ≥40), race/ethnicity (white or nonwhite), number of sexual partners (0, 1–2, ≥3), previous chlamydia diagnosis (yes or no), and previous gonorrhea diagnosis (yes or no).

Chlamydia and/or Gonorrhea Prevalence and Unadjusted ORs (Derivation Data Set)*
Prediction Rule for Quantifying the Probability of Chlamydia and/or Gonorrhea Infection

Figure 2 shows the receiver operating curves for the chlamydia and/or gonorrhea risk estimation model in the derivation and temporal validation populations. The model demonstrated good discrimination in the derivation population (AUC = 0.75; 95% confidence interval [CI], 0.72–0.80). Because internal validation indicates an upper limit of the expected performance in new settings, the 10-fold cross-validation indicated the lack of evidence for overfitting (AUC = 0.74; 95% CI, 0.70–0.77). The model demonstrated strong calibration in the derivation population, indicating good fit; the Hosmer-Lemeshow χ2 statistic was 3.4 (8 df, P = 0.91). The coefficients yielded risk scores, with a minimum sum score of −2 and a maximum sum score of 26. To visualize the calibration of the prediction rule, the total sample was divided into 6 groups, as shown in Figure 3, which illustrates the observed proportion of chlamydia and/or gonorrhea infection as a function of the sum score derived from the final model. Higher sum scores were correlated with higher prevalence, further bolstering the good calibration indicated by the Hosmer-Lemeshow statistic.

Receiver operating curves for derivation and temporal validation populations.
Prevalence of chlamydia and/or gonorrhea within risk score categories.

The simplified risk scores can be applied for selective screening decision making. Table 4 shows the screening performance estimates at different cutoff levels of the sum scores. To identify all cases (i.e., 100% sensitivity), approximately 94% of the population would need to be screened at a sum score cutoff point of at least 1. However, by reducing the cutoff point of the risk score to at least 6, only 68% of the population would need to be screened to identify 91% of the cases, making this close to the benchmark of screening of 60% or less while identifying more than 90% of cases.

Sensitivity and Specificity of Cutoff Scores

Temporal Validation Population

The validation sample consisted of 14,956 clinic visits, of which 2.2% were diagnosed with chlamydia and/or gonorrhea infection. There were no notable differences between the derivation and validation populations; however, the temporal validation population had lower prevalence of IDU, previous chlamydia diagnosis, and previous gonorrhea diagnosis (Table 1). The model demonstrated acceptable discrimination in the temporal validation population (AUC = 0.64; 95% CI, 0.61–0.67; Fig. 2). The model also showed good calibration upon validation (Hosmer-Lemeshow χ2 = 8.8, 8 df, P = 0.36). When categorized into the same 6 risk categories as the derivation population, chlamydia and/or gonorrhea infection prevalence ranged from 0.1% in the lowest-risk category to 16.1% in the highest-risk category (Fig. 3). When the simplified risk scores were considered in the temporal validation population, choosing the risk cutoff point of at least 6 would identify 83% of cases while screening 68% of the population (Table 4).


We derived and validated a risk scoring tool for assessing the risk of chlamydia and/or gonorrhea infection among asymptomatic women and heterosexual men accessing STI clinics in Vancouver, BC, using predictors that are relatively easy to assess in a clinical encounter. The waning in discriminatory performance of the model could be due to the difference in case-mix between the 2 periods.26 Although the later time frame had a slightly higher prevalence of infection, individuals comprising this population reported lower proportions of the risk factors included in the final model. As a result, the discrimination between cases and noncases in the more homogeneous temporal validation population was more difficult.26 In addition, there is also the possibility that missed risk factors could have impacted the discriminatory performance of the model in the temporal validation population. Future studies should consider reestimating the logistic regression model to identify whether the inclusion of more risk factors will improve the discriminatory performance of the model.27,28

The aim of selective screening is to increase sensitivity (percentage of cases detected) and to increase efficiency (decrease the percentage of the population to be screened). In the derivation population, if screening were advised for people with a score of at least 6, 91% of the cases would be detected by screening only 68% of the population; the optimal threshold of the benchmark of 90% sensitivity by screening 60% would be close to being reached. However, the 32% reduction in the number of individuals that need to be screened using the ≥6 risk score cutoff compared with screening the whole population shows that the efficiency of screening in population-based programs may be improved by targeting screening in this way.23 Age less than 25 years is a screening criterion used in the United States and Canada.29,30 Using this criterion in the derivation population would require screening 21% of the population while identifying only 40% of cases, a performance that falls short of the screening benchmark even after increasing the age cutoff to less than 30 years, indicating that the prediction rule could be useful in decision making in this setting.

The findings of this analysis were compared with a recent study that involved the derivation of a risk scoring tool for chlamydia infection among sexual health clinic attendees in Sydney, Australia.9 The models (which comprised demographics, sexual behavior, and clinical symptoms) showed modest discrimination; the AUC was 0.71 and 0.72 in heterosexual males and females, respectively. The screening efficiency benchmark was not reached in the populations studied. A risk score cut-point of at least 15 yielded a sensitivity of 90% and fraction screened of 87% in heterosexual females, whereas a cut-point of at least 25 yielded a sensitivity of 89% and fraction screened of 87% among heterosexual males. The AUC of the “Vancouver” risk estimation was slightly higher compared with the AUCs of the aforementioned study.9 The increase in this performance metric was also reflected in the higher screening efficiency in the Vancouver risk scoring tool compared with the “Sydney” tool. This finding bolsters confidence in the predictive strength of the Vancouver algorithm, especially because the algorithm excludes symptoms, which have been shown to be significantly associated with infection.31

The algorithm also has potential to inform screening decisions, especially in low STI prevalence settings. For example, to aid the scaling up of GCO, BC’s Internet-based STI testing program, the algorithm could be adapted into a self-selection tool for filtering GCO participants based on risk profile. Only participants with sufficient risk score would be recommended to receive testing through GCO.9 It is also anticipated that the prediction rule could potentially facilitate decision making in traditional clinical encounters where the algorithm could be used to display an alert on the computer screen to prompt clinicians to offer specific STI tests to those at increased risk of infection. The prediction rule could also be used to inform ongoing clinical recommendations related to selective screening of STI clients and standardize STI testing at the clinics in BC, potentially enabling targeted testing, thereby reducing the unnecessary testing of those without the infection and saving costs. The algorithm could be helpful in prioritizing or triaging patients. In sexual health clinics, triage services require patients to fill out risk assessment questionnaires in the waiting room before receiving services.2,5,32 The risk estimation algorithm could be adapted into a decision-making tool that prioritizes patients by risk scores. This may enable clinicians to effectively determine whether triaging those with low-risk scores to receive “express” services (e.g., providing self-collected specimens) or referring others to more comprehensive services (e.g., complete physical examinations).8 This process may help decrease wait times, improve clinical workflow, and reduce unnecessary clinician face time.

Our study has some limitations. The data from which the model was derived came from STI clinics and this could limit generalizability to other settings such as reproductive health clinics or general practice settings. The generalizability of the model in additional STI clinics outside Vancouver, BC, will be the subject of a forthcoming paper. One assumption the algorithm makes is that of a fixed epidemic; however, this assumption would need to be reevaluated over time to reflect the evolution of patient risk profiles, a possibility that could be facilitated by the adoption of electronic medical records. Also, although we examined chlamydia and/or gonorrhea infection as a composite outcome, we conducted a subanalysis and found that there was no difference in the performance of the prediction rule for detecting gonorrheal infection (AUC = 0.73) and chlamydia infection (AUC = 0.75) only. This finding gave us more confidence in combining both outcomes.

In conclusion, we derived a pragmatic risk scoring tool from a model that included a diverse patient population. As funding available for sexual health services decrease (or in some cases, remain stagnant) and STI rates increase, public health programs are in need of novel strategies to maximize service efficiency. Future research is needed to determine whether the adoption of risk estimation tools such as the one developed here will result in economic savings and long-term impact on resource allocation (including health human services and clinic operations).


1. Rietmeijer CA, Mettenbrink C. Why we should save our STD clinics. Sex Transm Dis 2010; 37: 591.
2. Fairley CK, Williams H, Lee DM, et al. A plea for more research on access to sexual health services. Int J STD AIDS 2007; 18: 75–76.
3. Golden MR, Kerndt PR. Improving clinical operations: Can we and should we save our STD clinics? Sex Transm Dis 2010; 37: 264–265.
4. Hottes TS, Farrell J, Bondyra M, et al. Internet-based HIV and sexually transmitted infection testing in British Columbia, Canada: Opinions and expectations of prospective clients. J Med Internet Res 2012; 14: e41.
5. Shamos SJ, Mettenbrink CJ, Subiadur JA, et al. Evaluation of a testing-only “express” visit option to enhance efficiency in a busy STI clinic. Sex Transm Dis 2008; 35: 336–340.
6. Falasinnu T, Gustafson P, Gilbert M, et al. Risk prediction in sexual health contexts: Protocol. JMIR Res Protoc 2013; 2: e57.
7. Geisler WM, Chow JM, Schachter J, et al. Pelvic examination findings and Chlamydia trachomatis infection in asymptomatic young women screened with a nucleic acid amplification test. Sex Transm Dis 2007; 34: 335–338.
8. van den Broek IV, Brouwers EE, Gotz HM, et al. Systematic selection of screening participants by risk score in a chlamydia screening programme is feasible and effective. Sex Transm Infect 2012; 88: 205–211.
9. Wand H, Guy R, Donovan B, et al. Developing and validating a risk scoring tool for chlamydia infection among sexual health clinic attendees in Australia: A simple algorithm to identify those at high risk of chlamydia infection. BMJ Open 2011; 1: e000005.
10. Falasinnu T, Gustafson P, Hottes TS, et al. A critical appraisal of risk models for predicting sexually transmitted infections. Sex Transm Dis 2014; 41: 321–330.
11. Mahony JB, Luinstra KE, Tyndall M, et al. Multiplex PCR for detection of Chlamydia trachomatis and Neisseria gonorrhoeae in genitourinary specimens. J Clin Microbiol 1995; 33: 3049–3053.
12. Heymans MW, van Buuren S, Knol DL, et al. Variable selection under multiple imputation using the bootstrap in a prognostic study. BMC Med Res Methodol 2007; 7: 33.
13. Janssen KJ, Donders AR, Harrell FE Jr, et al. Missing covariate data in medical research: To impute is better than to ignore. J Clin Epidemiol 2010; 63: 721–727.
14. Janssen KJ, Vergouwe Y, Donders AR, et al. Dealing with missing predictor values when applying clinical prediction models. Clin Chem 2009; 55: 994–1001.
15. Vergouw D, Heymans MW, Peat GM, et al. The search for stable prognostic models in multiple imputed data sets. BMC Med Res Methodol 2010; 10:81-2288-10-81.
16. Survey Research Center, Institute for Social Research, University of Michigan. IVEware: Imputation and Variance Estimation software. Available at: Updated 2013. Accessed August 7, 2013.
17. He Y, Raghunathan T. On the performance of sequential regression multiple imputation methods with non normal error distributions. Commun Stat Simul Comput 2009; 38: 856–883.
18. Vergouwe Y, Royston P, Moons KG, et al. Development and validation of a prediction model with missing predictor data: A practical approach. J Clin Epidemiol 2010; 63: 205–214.
19. Gotz HM, van Bergen JE, Veldhuijzen IK, et al. A prediction rule for selective screening of Chlamydia trachomatis infection. Sex Transm Infect 2005; 81: 24–30.
20. Justice AC, Covinsky KE, Berlin JA. Assessing the generalizability of prognostic information. Ann Intern Med 1999; 130: 515–524.
21. Steyerberg EW. Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating. New York: Springer, 2009.
22. Vickers AJ, Cronin AM. Traditional statistical methods for evaluating prediction models are uninformative as to clinical value: Towards a decision analytic framework. Semin Oncol 2010; 37: 31–38.
23. Gotz HM, Veldhuijzen IK, Habbema JD, et al. Prediction of Chlamydia trachomatis infection: Application of a scoring rule to other populations. Sex Transm Dis 2006; 33: 374–380.
24. Haukoos JS, Lyons MS, Lindsell CJ, et al. Derivation and validation of the Denver human immunodeficiency virus (HIV) risk score for targeted HIV screening. Am J Epidemiol 2012; 175: 838–846.
25. La Montagne DS, Patrick LE, Fine DN, et al. Region X Infertility Prevention Project. Re-evaluating selective screening criteria for chlamydial infection among women in the U S Pacific Northwest. Sex Transm Dis 2004; 31: 283–289.
26. Toll DB, Janssen KJ, Vergouwe Y, et al. Validation, updating and impact of clinical prediction rules: A review. J Clin Epidemiol 2008; 61: 1085–1094.
27. Moons KG, Kengne AP, Grobbee DE, et al. Risk prediction models: II. External validation, model updating, and impact assessment. Heart 2012; 98: 691–698.
28. Rosser BR, Miner MH, Bockting WO, et al. HIV risk and the Internet: Results of the Men’s INTernet Sex (MINTS) study. Aids Behav 2009; 13: 746–756. Available at:
29. Meyers D, Wolff T, Gregory K, et al. USPSTF recommendations for STI screening. Am Fam Physician 2008; 77: 819–824.
30. Public Health Agency of Canada. Canadian Guidelines On Sexually Transmitted Infections—Updated January 2010. Available at: Updated 2011. Accessed July 22, 2013.
31. Xu F, Stoner BP, Taylor SN, et al. “Testing-only” visits: An assessment of missed diagnoses in clients attending sexually transmitted disease clinics. Sex Transm Dis 2013; 40: 64–69.
32. Martin L, Knight V, Ryder N, et al. Client feedback and satisfaction with an express sexually transmissible infection screening service at an inner-city sexual health center. Sex Transm Dis 2013; 40: 70–74.
© Copyright 2014 American Sexually Transmitted Diseases Association