Control of sexually transmitted infections (STIs) is a public health challenge and includes a variety of diseases such as Chlamydia trachomatis, gonorrhea, syphilis, and HIV with different epidemiology and risk groups. Well-performing clinical prediction rules (CPRs) can potentially support STI control with prioritization of testing.1 We welcome the effort of Falasinnu et al.2 to provide a critical appraisal of existing CPRs in sexual health contexts. They identified 16 studies reporting on CPRs for STIs and gave a broad overview of the methodological quality of the studies (Table 3)2 and of the performance of the CPRs (Table 4).2 Here, we discuss and prioritize the performance measures and quality items that were used with the aim to enable identification of valid CPRs for specific STIs.
Successful external validation, including assessment of calibration (agreement between predicted probabilities and observed outcome frequencies) and discrimination (ability to distinguish between individuals with and without the outcome), is considered the most important proof of generalizability of a CPR.1,3 External validation assesses if a CPR works in individuals other than those from whose data it was derived, in contrast with internal validation where a CPR’s performance is assessed in the same individuals than those from whose data it was derived. In our view, external validation should be part of the assessment of both the methodological quality of the studies in which CPRs are derived and the performance of the derived CPRs. Haukoos et al.4 is a good example where a CPR for HIV risk was derived from patients in a sexually transmitted disease clinic and was consecutively externally validated in an emergency department among the general population. When external validation is lacking, CPR performance within the development data (internal validation) and other methodological quality criteria become more important. As for CPR performance assessment, discrimination may be the most important measure of a CPR’s internal validity. Calibration is usually good in the same individuals whose data the CPR was derived from and is particularly meaningful to assess in external validation data. Because the discrimination of a CPR within the individuals whose data were used to derive the CPR may be a too optimistic reflection of the discrimination in other individuals, it should be assessed with either cross-validation or bootstrap techniques, especially when sample size (or actually number of positives) is small.5 This may be exemplified by the study of Verhoeven et al.,6 who predicted chlamydia infection in general practice with combinations of predictors and found excellent discrimination with an area under the receiver operating characteristic curve of 0.88: that is, an 88% probability of predicting a higher risk for a positive than for a negative individual. The study sample size was, however, quite small (n = 774; 39 positives), and no internal validation techniques were used. The area under the receiver operating characteristic curve at internal or external validation will likely be substantially smaller. Falasinnu et al. define a CPR to perform well in terms of efficiency and sensitivity if at least 90% of the infections are detected while testing at most 60% of the patients. This benchmark is taken from a study assessing selective screening criteria for C. trachomatis in an opportunistic screening program in the UK.7 Assessing the quality of CPRs with this benchmark for various STIs is arguable. From a decision-making viewpoint, the required sensitivity depends on the burden of missing an infection, which may be different for HIV and chlamydia infections and which needs to be balanced with the burden of unnecessary diagnostic testing. Furthermore, another benchmark efficiency may be based on costs. A prediction rule that needs 65% of the patients to be tested to detect 90% of the infections may well be cost-effective.
Falasinnu et al. assessed the methodological quality with 16 items (Tables 1 and 3),2 similar to other methodological review studies of CPRs.8,9 We agree with their discussion of the limitation that equal weights assigned to items on the quality checklist ignore the possibility that some items are more important than others. A description of study design and study sample will generally be given in peer-reviewed publications and does not add substantially to the quality of a CPR. Variable definitions and details of methods of assessment for predictors and outcomes should certainly be described, but some gradation in scoring may be useful. Objective predictors like sex, age, area of living, and country of birth can be derived from registries, whereas questionnaire data like education, marital status, sexual behavior data, history of STI, and symptoms are subjective and potentially subject to bias and misclassification. Although ethnicity or race is commonly reported as predictor, it may be controversial to assess. Reporting of missing values is certainly necessary as well as a description of how missing values were dealt with. However, this is less relevant when the percentage of missing values is small. We do agree that multivariable statistical methods should be used to examine the associations between potential predictors and STI outcomes. Yet, assessing the selection of predictors is a vital first step: are they clinically meaningful and easy to use? However, this requires a clear definition of “clinically meaningful.” In addition, we recommend to always give a structured presentation of the number of times a predictor is used among the studies (e.g., a table representing the presence of each predictor among the CPR studies in Kulik et al.10) and, ideally, a measure of the strength of each predictor (e.g., a table with odds ratios for each predictor in Leushuis et al.11).
Summarizing, we encourage studies to identify which CPRs or predictors perform well and meet the necessary quality standards including internal and external validation, which can often be done relatively easy on existing datasets. Successful external validation of CPRs, when necessary with updating of predictors, should be followed by an impact analysis that shows whether clinical practice is changed with beneficial consequences.12 The clinical impact of applying a CPR with a chosen risk threshold (cutoff value) is determined by the cost of missing infections versus the benefit of less unnecessary testing and by the financial costs.
Clinical impact is ideally analyzed with a randomized controlled trial, but other types of impact comparison can also be informative: Haukoos et al.13 validated an HIV risk score in data from an emergency department and applied this risk score in a prospective, before-after design in another emergency department, comparing the targeted screening with untargeted screening for HIV in earlier periods of time. In our own studies, we externally validated our chlamydia prediction rule in 2 different populations.14,15 The prediction rule was further implemented in a large-scale screening program, where selection was attained by a score derived from a short questionnaire. It proved highly effective in yielding high positivity rates in an area with lower prevalence of chlamydia.16 An intermediate way of impact analysis assesses how many STIs one would miss using a CPR with different cutoff values. This is exemplified by algorithms developed for detection of chlamydial and gonococcal infections at an emergency department setting.17 Likewise, the clinical impact of using CPRs with or without controversial predictors like race can be compared.18
Although only few, some promising CPRs for different STIs are available, and we do think using CPRs is the way forward for targeted STI screening. Apart from using chlamydia prediction rules in systematic chlamydia screening, they may be used in guiding STI clinicians in whom to offer opportunistic chlamydia testing—more selectively than using age group alone. Other applications of prediction rules may be (Internet) triaging systems in STI clinics, both for targeting which STI to test for (e.g., only chlamydia or testing also for other STI) and for targeting HIV testing in low-prevalence populations, as well as for individual risk assessment by using scoring questionnaires, which may encourage test uptake by increasing risk awareness.
1. Steyerberg EW. Clinical prediction models: A practical approach to development, validation, and updating. New York: Springer, 2009.
2. Falasinnu T, Gustafson P, Hottes TS, et al. A critical appraisal of risk models for predicting sexually transmitted infections. Sex Transm Dis 2014: 321–331.
3. Altman DG, Royston P. What do we mean by validating a prognostic model? Stat Med 2000; 19: 453–73.
4. Haukoos JS, Lyons MS, Lindsell CJ, et al. Derivation and validation of the Denver Human Immunodeficiency Virus (HIV) risk score for targeted HIV screening. Am J Epidemiol 2012; 175: 838–46.
5. Harrell FE Jr, Lee KL, Mark DB. Multivariable prognostic models: Issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med 1996; 15: 361–87.
6. Verhoeven V, Avonts D, Meheus A, et al. Chlamydial infection: an accurate model for opportunistic screening in general practice. Sex Transm Infect 2003; 79: 313–7.
7. La Montagne DS, Patrick LE, Fine DN, et al. Re-evaluating selective screening criteria for chlamydial infection among women in the US Pacific Northwest. Sex Transm Dis 2004; 31: 283–9.
8. Maguire JL, Kulik DM, Laupacis A, et al. Clinical prediction rules for children: A systematic review. Pediatrics 2011; 128: e666–e677.
9. Bouwmeester W, Zuithoff NP, Mallett S, et al. Reporting and methods in clinical prediction research: A systematic review. PLoS Med 2012; 9: 1–12.
10. Kulik DM, Uleryk EM, Maguire JL. Does this child have appendicitis? A systematic review of clinical prediction rules for children with acute abdominal pain. J Clin Epidemiol 2013; 66: 95–104.
11. Leushuis E, van der Steeg JW, Steures P, et al. Prediction models in reproductive medicine: A critical appraisal. Hum Reprod Update 2009; 15: 537–52.
12. Steyerberg EW, Moons KG, van der Windt DA, et al. Prognosis Research Strategy (PROGRESS) 3: Prognostic model research. PLoS Med 2013; 10: e1001381.
13. Haukoos JS, Hopkins E, Bender B, et al. Comparison of enhanced targeted rapid HIV screening using the Denver HIV risk score to nontargeted rapid HIV screening in the emergency department. Ann Emerg Med 2013; 61: 353–61.
14. Gotz HM, van Bergen JE, Veldhuijzen IK, et al. A prediction rule for selective screening of Chlamydia trachomatis
infection. Sex Transm Infect 2005; 81: 24–30.
15. Gotz HM, Veldhuijzen IK, Habbema JD, et al. Prediction of Chlamydia trachomatis
infection: Application of a scoring rule to other populations. Sex Transm Dis 2006; 33: 374–80.
16. van den Broek IV, Brouwers EE, Gotz HM, et al. Systematic selection of screening participants by risk score in a chlamydia screening programme is feasible and effective. Sex Transm Infect 2012; 88: 205–11.
17. Al-Tayyib AA, Miller WC, Rogers SM, et al. Evaluation of risk score algorithms for detection of chlamydial and gonococcal infections in an emergency department setting. Acad Emerg Med 2008; 15: 126–35.
18. Stein CR, Kaufman JS, Ford CA, et al. Screening young adults for prevalent chlamydial infection in community settings. Ann Epidemiol 2008; 18: 560–71.