Background: Prediction rules have been proposed as alternatives to screening recommendations and have potential applications in sexual health decision making. To our knowledge, there has been no review undertaken providing a critical appraisal of existing prediction rules in sexual health contexts. This review aims to identify and characterize prediction rules developed and validated for sexually transmitted infection (STI) screening, describe the methodological issues essential to the suitability of derived models for clinical or public health application, and synthesize the literature on the performance of these models.
Methods: We searched MEDLINE (2003–2012) to identify studies that reported on models predicting STIs. We explored the methodological quality of the studies based on a 16-item quality assessment checklist. We also evaluated the studies based on data extracted on model discrimination, calibration, sensitivity, and testing efficiency.
Results: We identified 16 publications reporting on STI prediction rules. The most poorly addressed quality items were missing values, calibration measures, and variable definition. Overall, the performance of risk models as measured by discrimination (area under the receiver operating characteristic curve range, 0.64–0.88) and calibration was found to be generally good or satisfactory. Eight studies attained or were close to attaining the performance benchmark of testing less than 60% of the target population to achieve 90% sensitivity. The 2 risk models that were externally validated displayed adequate discrimination in new settings.
Conclusions: Although we identified several well-performing STI risk prediction rules, few have been validated. Future developments in the use of prediction rules should address their clinical consequence, comparative usefulness, external validity, and implementation impact.
A review identified several well-performing sexually transmitted infection risk prediction rules; however, few of these rules have been validated.
From the *The School of Population and Public Health, University of British Columbia, Vancouver, Canada; †The Department of Statistics, University of British Columbia, Vancouver, Canada; and ‡British Columbia Centre for Disease Control, Vancouver, Canada
Conflict of interest disclosure and sources of funding: Titilola Falasinnu is supported by the Canadian Institutes of Health Research Doctoral Research Award. Paul Gustafson is supported by a grant from the Natural Sciences and Engineering Research Council of Canada. All of the remaining authors have disclosed that they have no financial relationships with or interests in any commercial companies pertaining to this manuscript.
Correspondence: Titilola Falasinnu, MHS, School of Population and Public Health, University of British Columbia, 2206 East Mall, Vancouver, BC, Canada V6T 1Z3. E-mail address: firstname.lastname@example.org.
Received for publication February 14, 2013, and accepted February 18, 2014.
Publicly funded sexual health clinics are tasked with providing free or low cost sexually transmitted infections (STI) testing and treatment services in many jurisdictions. However, the recent economic downturn and ensuing budgetary limitations facing public health infrastructure (even in many high-income countries) have led to STI clinic closures and/or scale-back of operations in some jurisdictions.1 At the same time, increased clinic patient volumes contribute to scenarios where clinicians must provide more services with diminished resources.1,2 In many high-income countries, STI clinics are currently operating at maximum capacity amidst rising STI rates, highlighting the urgent need for the adoption of innovative service delivery options that provide high-quality and comprehensive STI services while limiting the burden on fiscal resources.3
Internet-based screening programs have been proposed as potentially cost-reducing alternative service delivery models in recent years.4 These new models are facilitated in part by the increasing adoption of highly sensitive and specific STI testing technologies, including nucleic acid amplification tests (NAATs) conducted on urine specimens and rapid testing, which make the provision of testing services more convenient and accessible in these alternative settings where patients have limited interaction with clinicians.5 However, strategies for identifying those at increased risk for STIs are urgently needed in alternative service delivery models. Current screening recommendations issued by public health and professional organizations have deficiencies (e.g., oversimplification and lack of generalizability) that limit their adoption for individualized decision making to the specific patient, and there is growing support for more nuanced risk assessment considerations that involve more than the body of evidence concerning screening recommendations alone.6
Risk prediction approaches that reflect a continuous risk spectrum have been widely adopted in chronic disease decision making (e.g., the Framingham risk score) and have been proposed as alternatives to screening recommendations in various contexts.7 Clinical prediction rules (CPRs) are tools that provide estimates of absolute risk based on the combination of several patient characteristics, thus allowing for more nuanced and precise decision making than screening recommendations when applied to individual patients.8 Prediction rules have potential applications in sexual health decision making. They may help prioritize STI testing resources and help clinicians and public health administrators make crucial decisions about where to focus STI screening efforts on the individual and population level (e.g., recommending specific STI tests, as well as frequency of testing). Specifically, CPRs can be used in targeted screening programs, public education initiatives, and risk communication to patients to encourage STI testing or behavior modification.5,9
To become routinely incorporated into sexually health service delivery, CPRs must demonstrate good performance and generalizability. To our knowledge, there has been no review undertaken providing a critical appraisal of the methodological quality and performance measures of CPRs in sexual health contexts. Specifically, this review aims to (1) identify and characterize prediction rules developed and validated for STI screening, (2) describe and critically appraise the methodological issues essential to the suitability of derived models for clinical or public health application, and (3) synthesize the literature on the performance of these models.
We conducted a search of MEDLINE to identify potentially relevant studies published between 2003 and 2012, an era characterized by the expansion of highly sensitive tests (e.g., NAATs) and convenient specimen collection (e.g., urine) in an effort to ensure comparability in specimen collection and STI outcome assessment. Because of the lack of MeSH terms denoting CPRs, we used the following terms: a combination of “chlamydia,” “gonorrhea,” “HIV,” “syphilis,” “STI,” or “STD” and “screening.” We focused on these 4 STI outcomes because they are of particular interest to public health diagnosis and treatment. This review was also limited to studies published in English-language journals with populations derived from North America, Western Europe, and Australia—regions with comparable STI prevalence and social determinants of sexual health. We did not search the gray literature because we wanted to include only studies that are published in journals read by sexual health clinicians and decision makers.
We assessed the inclusion of articles through a 3-step process. First, we evaluated the title and abstract of each article identified through the MEDLINE search for relevance. Second, we manually reviewed the complete manuscript of articles identified as relevant through the title and abstract review. Eligible publications had to report a scoring or assessment tool for risk prediction or stratification. Finally, we reviewed the references of identified articles to find additional articles that may have been missed in the initial electronic search. We only included publications that provided measures of association between variables and STI outcomes in the final multivariable prediction model. Reporting of at least 1 empirical performance metric of the prediction model (e.g., discriminative and/or calibration measures) was preferable but not crucial for inclusion.
Assessment of Methodological Quality
For each article concerned with CPR derivation and validation, we extracted study characteristics such as STI outcome, objectives, study setting, predictors, and the presentation format. Next, we limited the assessment of methodological quality to only studies that derived prediction models. Two reviewers (T.F. and J.S.) performed data extraction and quality assessment independently. Table 1 shows the 16-item quality assessment checklist derived from assessment tools previously applied in this area.10,11 The presence or absence of each item was recorded as a score of 1 or 0, respectively; the maximum total score was 16. The reviewers resolved scoring discrepancies through deliberation and arrived at consensus.
Assessment of CPR Performance Measures
To evaluate the discriminative performance of both derived and validated models, we extracted metrics on the area under the receiver operating characteristic curve (AUC) or c statistic, which indicates the ability of CPRs to discriminate between patients with or without STI outcomes.5 For model calibration, we assessed whether the publication reported on the difference between the observed and predicted rates of the STI outcome, as well as the corresponding test statistic or P value.9 Calibration metrics assess the ability of a CPR to accurately predict the level of observed risk.9 To define the performance of the prediction models, we extracted data on their sensitivity and efficiency. Sensitivity was defined as the percentage of cases detected, and efficiency was the percentage of patients that would have been tested based on the predictive criteria. As initially defined by La Montagne and colleagues,12 thresholds of 60% efficiency and 90% sensitivity were used as ideal benchmarks for CPR performance.13 Performance was considered acceptable if more than 90% of cases were detected while testing less than 60% of patients.12,13
We identified 216 potentially relevant articles after screening the title and abstract of the 841 articles initially identified from the MEDLINE search. Of these, 78 full-text articles examining predictors of STIs were evaluated for inclusion; however, only 16 studies evaluated derived and/or validated prediction rules in sexual health contexts.5,7,9,12–24
Characteristics of Studies Included
Table 2 summarizes data from studies that derived and validated STI prediction rules. Twelve studies evaluated chlamydia outcomes,5,7,12–15,18,20–24 3 studies evaluated HIV outcomes,9,16,19 and 4 studies evaluated gonorrhea outcomes.14,17,18,22 Three of the included studies were population-based studies.5,15,24 Half of the studies were conducted in STI (n = 5)7,9,12,13,16 and family planning (n = 3)12,17,20 clinics. There was a significant amount of variation in the risk factors included in the prediction models; however, there were some predictors (e.g., age, clinical symptoms, and number of sexual partners) that were common to all CPRs.
All studies except for one developed or validated CPRs using cross-sectional data. Trick and colleagues22 developed a CPR for screening chlamydia and gonorrhea infection in a detention facility in the United States using case-control data. Logistic regression was used to derive all the CPRs in this review. We identified 2 distinct CPR presentation formats: point-based scoring systems (n = 6)5,7,9,14,21,24 and predictions by predictor combinations (n = 10).12,13,15–20,22,23 Point-based scoring systems were derived from the β coefficients of predictors in multivariable regression models. Predictions made by predictor combinations were derived from unweighted checklists of variables.
Evaluation of Methodological Quality
Table 3 summarizes the methodological quality of the CPR derivation studies included in this review. Studies met between 9 and 15 quality items. All studies had adequate study design, study sample description, statistical methods, model selection methods, and ease of use. The most poorly addressed CPR quality items were missing values (n = 3),9,13,16 calibration measures (n = 5),5,7,9,19,23 and variable definition (n = 8).9,13–16,19–21 The highest-quality studies were 2 studies that developed CPRs for HIV screening, with each meeting 15 of 16 quality items.12,19 In general, studies that developed scoring systems met more quality items (median, 13; range, 12–15) than studies whose CPRs were based on predictor combinations (median, 11.5; range, 9–15). Studies published in the past 5 years met more quality items (median, 13; range, 12–15) than older studies (median, 11; range, 9–15).
Assessment of Rule Performance
Table 4 shows the performance of derived STI prediction rules. Sexually transmitted infection prevalence ranged from 0.02% (for acute HIV)19 to 29.2% (for chlamydia/gonorrhea).18 Six studies were not internally validated (i.e., apparent validation).12,13,17,20,22,23 Five prediction rules were internally validated using split-sample validation,15,16,18,19 3 studies used bootstrapping validation,5,14,21 and 1 study used cross-validation.9 Eleven studies reported AUC ranging from 0.64 to 0.88, indicating a modest-to-good discriminatory performance.5,7,9,14–16,19,21–24 Three studies reported adequate calibration statistics in 1 of 2 ways: (1) the Hosmer-Lemeshow goodness-of-fit test, with low P values indicating poor goodness of fit, or (2) graphically comparing predicted STI prevalence with observed STI prevalence, drawing a linear regression line through the points, and calculating its R2 and slope.5,7,9
We compared the efficiency of CPRs by examining the proportion of the population tested when the model’s sensitivity is greater than 90% (Table 4). Efficiencies at this sensitivity cutoff varied between 42% and 100%. Eight studies attained or were close to attaining the performance benchmark of testing less than 60% of the target population or subpopulations.5,12,13,15–17,19,23 We found that good discrimination was correlated with higher efficiencies. The most efficient study developed a strategy for the selective testing of women for chlamydia in general practice in Belgium.23 The authors identified the following testing algorithm: test women younger than 35 years with more than 1 partner in the past year, or test women with any 2 of the following characteristics: age 18 to 27 years, frequent postcoital bleeding, no contraception, or partner(s) with urinary complaints.23 This algorithm detected 92% of infections, and only 38% of the population was tested. The AUC was 0.88, indicating excellent discrimination accuracy.23
Validation of STI Prediction Rules
Table 5 shows the results of the external validation of STI prediction rules. Two prediction rules were externally validated. Gotz and colleagues5 developed a predictive rule for screening chlamydia in the general population in the Netherlands, and the following predictors comprised the final model: age, level of residential urbanization, ethnicity, urogenital symptoms, lifetime number of sexual partners, new partner in previous 2 months, and no condom at last sexual encounter. The Dutch chlamydia risk score was validated in 2 additional settings: (1) a population-based study in Amsterdam and (2) an outreach screening project among high-risk youth in Rotterdam.24 Also, the risk score developed by Haukoos and colleagues9 helped identify patients with increased probability of diagnosed HIV infection in an STI clinic in Denver, Colorado, and was validated in an emergency department in Cincinnati, Ohio. The Denver HIV risk score comprised the following predictors: age, sex, race/ethnicity, sexual practices, injection drug use, and past HIV testing.9
Overall, the AUC in validation studies (0.66–0.75) was lower than that of derivation populations. The difference in the AUC ranged from −0.13 to −0.10 (Table 5), indicating poorer discrimination in validation populations.9,24 Calibration and efficiency measures were also worse in the validation populations.9,24
This review identified 16 publications involving the development and validation of 15 CPRs for STI testing services. Using a framework for evaluating the quality of the CPRs, we scored the studies based on the presence or absence of 16 items commonly deemed to be of importance for high-quality prediction rules. We evaluated the discrimination and calibration performance of the CPRs identified. We also assessed the rules based on their ability to test less than 60% of the population while detecting more than 90% of infections. Here, we contextualize the key findings of this review.
The considerable variability in the presentation format makes comparing the existing CPRs for STI testing challenging. In this review, 10 studies made prediction about infection using predictor combinations, while the rest issued scoring systems. All CPRs derived from scoring systems included a discussion of a clinical scenario in which decision making may be influenced by the predicted risk, provided a decision aid to stratify patients by risk, and also explicitly detailed a risk threshold for testing based on risk scores. We recommend that future CPR derivation studies adopt scoring systems based on several considerations. First, scoring systems mirror the probability of infection, thus allowing for personalized predictions at the individual level. Second, although the scoring systems are generally used to identify those at increased risk, they may also be used to identify those at low risk, thus limiting the number of tests performed and reducing the likelihood of false-positive tests in scenarios such as low-prevalence settings where the yield of new diagnoses is low.9 Third, scoring systems may have the additional advantage of informing clinicians and patients about personalized risk allowing both to share in STI testing decision making.9
The quality of the methodological approaches used to derive and validate CPRs determines their performance and future applicability. Although multiple areas of strength were noted, a number of methodological weaknesses were common in the reviewed CPR studies. These were inadequate handling of missing values, absence of calibration measures, and inadequate variable definition. There is considerable evidence that CPRs that fall short of these quality criteria are likely biased.25 First, missing values in clinical data rarely occur completely at random.26 Commonly missing values are often correlated with predictor and outcome measures. Most studies included in this review excluded participants with missing values, which may have led to not only loss of statistical power but also selection bias of subjects.26 Second, calibration measures are essential in model validation studies to ascertain whether the predicted probabilities equal observed probability of the outcome in consideration.26 For example, consider a scenario where an individual presenting for testing in an online setting or triaging at a clinic may be interested in knowing their STI risk. The AUC, reported twice as often as calibration measures in studies included in this review and a measure of discriminatory accuracy, may not be optimal in stratifying individuals into risk categories.27 In this situation, the agreement between predicted risk and actual risk is paramount, and the absence of calibration metrics makes it impossible to determine if the CPR gives an accurate assessment of risk. Third, without a clear description of how CPR variables were assessed, it is difficult for clinicians to reproduce these predictors in risk assessment, thus affecting the real-world application of the CPR.25
Clinical usefulness of CPRs in this review was measured using sensitivity and efficiency. The ideal scenario was to identify the most cases while testing the fewest number of people.12 There is a natural tension between finding the most cases and limiting the number of people unnecessarily tested. In scoring systems, this relationship is closely tied to the identification of the most efficient cutoff level and depends on costs and priorities. The cutoff level also depends on the context; for example, in systematic population-based screening programs where the screening of high-risk individuals is priority, missed infections are unavoidable and the 90% sensitivity and 60% efficiency benchmark may be appropriate. A different scenario arises in individuals seeking STI testing in sexual health clinics where the sensitivity of the screening algorithm has to be higher to avoid missing infections in individuals. Indeed, proving cost-effectiveness of CPRs is highly relevant in the current era in which HIV testing is being expanded to include individuals in settings (e.g., general practice) that would be expected to be at lower HIV risk than individuals that are currently offered in high-risk settings (e.g., STI clinics).9
Evidence showing performance of a CPR in new populations is an important consideration before recommending its widespread adoption. We identified only 2 CPRs that have been validated in different populations.9,24 External validation (e.g., using populations from a different period or a different geographical setting) is essential to provide good estimates of these rules’ performances in new settings.9,24 We suggest that more validation studies of existing CPRs are needed, ideally conducted by independent investigators, to ensure their generalizability.28 Also, decision makers may consider incorporating or updating features of existing rules instead of developing new models in their own setting.28 This approach minimizes unnecessary derivation of new rules in addition to providing estimates of the performance of existing CPRs in different settings.28
After successful validation in multiple settings, the next step is to assess how CPRs impact on current clinical and population-based sexual health practice after they are implemented.29 At present, only one implementation study of STI prediction rules has been conducted.30 Future impact analyses should evaluate the effects of the implementation of the CPRs on overall client volume, number of tests ordered, costs (and other economic concerns), trends in STI prevalence, wait times, and number of clients turned away using before-after comparisons and experimental and well-designed observational study designs.31 Furthermore, detailed understandings of the implementation context of proposed CPRs (e.g., health human resources, “fit” with current clinical practice, and patient factors) and clear examination of features of clinical realities that might pose potential barriers (e.g., current workflow and availability of electronic health records) are necessary before these innovations are recommended in clinical practice. For example, future research studies could examine (1) clinicians’ understanding and willingness and (2) patients’ trust, preference, participation, and acceptance in adopting CPRs.30,31
This study has several limitations. The overall findings may have been impacted by the inconsistent quality of the included publications and a small overall sample size. The electronic MEDLINE search may have not identified all CPRs used for STI testing. However, the search process was conducted on several occasions to guarantee that all studies were identified. Also, a review of the bibliographies of all identified CPR publications revealed no additional articles.11 The equal weights assigned to the items on methodological quality checklist ignored the possibility that some items are more important than others and also failed to acknowledged the subjectivity of the items; however, 2 reviewers reached consensus on the presence or absence of each quality time for each study.11
Clinical prediction rules provide more information about an individual patient than even complex recommendations can accommodate. Accurate CPRs have the capability to limit resource use but are infrequently developed and uncommonly used in sexual health services. Validation and translational efforts are required to convert existing CPRs into powerful decision aids that can improve their uptake in sexual health contexts. Future developments in the use of CPRs in sexual health practice should address their clinical consequence and comparative usefulness, external validity, and implementation impact.
1. Rietmeijer CA, Mettenbrink C. Why we should save our STD clinics. Sex Transm Dis 2010; 37: 591.
2. Fairley CK, Vodstrcil LA, Read T. The importance of striving for greater efficiency. Sex Health 2011; 8: 3–4.
3. Masaro CL, Johnson J, Chabot C, et al. STI service delivery in British Columbia, Canada; Providers’ views of their services to youth. BMC Health Serv Res 2012; 12:240.
4. Greenland KE, Op de Coul EL, van Bergen JE, et al. Acceptability of the Internet-based chlamydia screening implementation in the Netherlands and insights into nonresponse. Sex Transm Dis 2011; 38: 467–474.
5. Gotz HM, van Bergen JE, Veldhuijzen IK, et al. A prediction rule for selective screening of Chlamydia trachomatis
infection. Sex Transm Infect 2005; 81: 24–30.
6. Owens DK. Improving practice guidelines with patient-specific recommendations. Ann Intern Med 2011; 154: 638–639.
7. Wand H, Guy R, Donovan B, et al. Developing and validating a risk scoring tool for chlamydia infection among sexual health clinic attendees in Australia: A simple algorithm to identify those at high risk of chlamydia infection. BMJ Open 2011; 1: e000005.
8. Steyerberg EW. Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating. New York: Springer; 2009.
9. Haukoos JS, Lyons MS, Lindsell CJ, et al. Derivation and validation of the Denver human immunodeficiency virus (HIV) risk score for targeted HIV screening. Am J Epidemiol 2012; 175: 838–846.
10. Ownsworth T, McKenna K. Investigation of factors related to employment outcome following traumatic brain injury: A critical review and conceptual model. Disabil Rehabil 2004; 26: 765–783.
11. Maguire JL, Kulik DM, Laupacis A, et al. Clinical prediction rules for children: A systematic review. Pediatrics 2011; 128: e666–77.
12. La Montagne DS, Patrick LE, Fine DN, Marrazzo JM, Region X Infertility Prevention Project. Re-evaluating selective screening criteria for chlamydial infection among women in the US Pacific Northwest. Sex Transm Dis 2004; 31: 283–289.
13. Hocking J, Fairley CK. Do the characteristics of sexual health centre clients predict chlamydia infection sufficiently strongly to allow selective screening? Sex Health 2005; 2: 185–192.
14. Al-Tayyib AA, Miller WC, Rogers SM, et al. Evaluation of risk score algorithms for detection of chlamydial and gonococcal infections in an emergency department setting. Acad Emerg Med 2008; 15: 126–135.
15. Andersen B, van Valkengoed I, Olesen F, et al. Value of self-reportable screening criteria to identify asymptomatic individuals in the general population for urogential chlamydia trachomatis infection screening. Clin Infect Dis 2003; 36: 837–844.
17. Manhart LE, Marrazzo JM, Fine DN, et al. Selective testing criteria for gonorrhea among young women screened for chlamydial infection: Contribution of race and geographic prevalence. J Infect Dis 2007; 196: 731–737.
18. Merchant RC, DePalo DM, Liu T, et al. Developing a system to predict laboratory-confirmed chlamydial and/or gonococcal urethritis in adult male emergency department patients. Postgrad Med 2010; 122: 52–60.
19. Miller WC, Leone PA, McCoy S, et al. Targeted testing for acute HIV infection in North Carolina. AIDS 2009; 23: 835–843.
20. Paukku M, Kilpikari R, Puolakkainen M, et al. Criteria for selective screening for Chlamydia trachomatis
. Sex Transm Dis 2003; 30: 120–123.
22. Trick WE, Kee R, Murphy-Swallow D, et al. Detection of chlamydial and gonococcal urethral infection during jail intake: Development of a screening algorithm. Sex Transm Dis 2006; 33: 599–603.
23. Verhoeven V, Avonts D, Meheus A, et al. Chlamydial infection: An accurate model for opportunistic screening in general practice. Sex Transm Infect 2003; 79: 313–317.
24. Gotz HM, Veldhuijzen IK, Habbema JD, et al. Prediction of chlamydia trachomatis infection: Application of a scoring rule to other populations. Sex Transm Dis 2006; 33: 374–380.
25. Kulik DM, Uleryk EM, Maguire JL. Does this child have appendicitis? A systematic review of clinical prediction rules for children with acute abdominal pain. J Clin Epidemiol 2013; 66: 95–104.
26. Bouwmeester W, Zuithoff NP, Mallett S, et al. Reporting and methods in clinical prediction research: A systematic review. PLoS Med 2012; 9: 1–12.
27. Cook NR. Statistical evaluation of prognostic versus diagnostic models: Beyond the ROC curve. Clin Chem 2008; 54: 17–23.
28. Echouffo-Tcheugui JB, Kengne AP. Risk models to predict chronic kidney disease and its progression: A systematic review. PLoS Med 2012; 9: e1001344.
29. Childs JD, Cleland JA. Development and application of clinical prediction rules to improve decision making in physical therapist practice. Phys Ther 2006; 86: 122–131.
30. van den Broek IV, Brouwers EE, Gotz HM, et al. Systematic selection of screening participants by risk score in a chlamydia screening programme is feasible and effective. Sex Transm Infect 2012; 88: 205–211.
© Copyright 2014 American Sexually Transmitted Diseases Association
31. Shamos SJ, Mettenbrink CJ, Subiadur JA, et al. Evaluation of a testing-only “express” visit option to enhance efficiency in a busy STI clinic. Sex Transm Dis 2008; 35: 336–340.