Secondary Logo

Journal Logo

Original Studies

Identification of US Counties at Elevated Risk for Congenital Syphilis Using Predictive Modeling and a Risk Scoring System

Cuffe, Kendra M. MPH; Kang, Joseph D.Y. PhD; Dorji, Tandin MS; Bowen, Virginia B. PhD; Leichliter, Jami S. PhD; Torrone, Elizabeth PhD; Bernstein, Kyle T. PhD

Author Information
Sexually Transmitted Diseases: May 2020 - Volume 47 - Issue 5 - p 290-295
doi: 10.1097/OLQ.0000000000001142


In the May 2020 issue of Sexually Transmitted Diseases, the article by Cuffe et al. had errors in the Beta coefficients listed in Table 2 (p.293). The corrected table is presented below.

Table 2. Simple and adjusted regression models for any congenital syphilis case at the county level in the United States, 2014-2015 ( n =3,141 US counties )

5+ images

Note: Table contains beta coefficients ( β ) unadjusted odds ratios (OR), adjusted odds ratios (AOR) and 95% confidence intervals (95% CI) for congenital syphilis cases by county-level factors. MSM is an abbreviation of men who have sex with men. P&S syphilis is an abbreviation for primary and secondary syphilis. Children in poverty was removed due to issues in multicollinearity. Area under the curve (AUC) for the multivariable logistic regression model using 2014-2015 congenital syphilis rates is 90.5% and the Bayesian Information Criteria statistic is 1231.4.

1 Children in single-parent households, MSM P&S syphilis case count, and population size were categorized into tertiles for purposes of creating risk scores.

* Weighted risk-scores ( W ) were calculated by multiplying adjusted beta coefficients ( β ) by 2. Reference groups were assigned a value of zero.

Sexually Transmitted Diseases. 48(3):e51, March 2021.


Congenital syphilis (CS) is one of the most serious consequences of syphilitic infection. It occurs when a mother is infected during pregnancy with the causative agent of syphilis, Treponema pallidum, and then transmits it to her unborn child.1 Untreated early syphilis in pregnant women can lead to fetal infection in up to 80% of cases and may result in stillbirth or death in up to 40% of cases.2 Neurodevelopmental impairments including cognitive delays, vision loss, and seizure disorder are common long-term consequences of infants born to mothers with reactive syphilis serology during pregnancy.3 Nonetheless, CS is a preventable outcome.4 Treatment has been proven highly effective in curing syphilitic infection in pregnant women and ultimately preventing CS in newborns.4 However, despite the availability of effective treatment, CS remains persistent.1

After initial declines in 2008 to 2012, reported CS cases have increased 177% in 2012 to 2017, from 8.4 to 23.3 cases per 100,000 live births.1 Increases in reported CS cases coincided with a 156% increase in reported primary and secondary (P&S) syphilis among women in 2012 to 2017,1 which was unsurprising given that increases in CS cases often parallel increases in syphilis among women.5 In addition, CS cases have been highly geographically concentrated in that only 5% of all counties in the United States reported at least one case of CS in 2015.1,6 The persistence of CS has been attributed to inadequate prenatal care (e.g., syphilis screening and treatment) among pregnant women and increasing syphilis rates.7–13 Specifically, CS often represents a failure within the health care system to diagnose and treat a pregnant woman infected with syphilis.11,12 Several studies have found strong associations between CS and inadequate prenatal care and have concluded that providing adequate screening, treatment, and risk counseling to pregnant women during prenatal care visits is an important intervention.11,12

Although there is a breadth of research that has focused on the influence of prenatal care and syphilis on CS, there is a paucity of research assessing the associations between CS and county-level socioeconomic (e.g., uninsured rate) and health factors (e.g., P&S syphilis rate). Although the syphilis rate among women of reproductive age is likely the strongest predictor of CS at the county level, it is possible that other county-level factors predict CS. It is likely that these county-level factors work in tandem with individual and health system-level challenges in accessing prenatal care as well as with increasing female syphilis rates.

Given that CS is not geographically widespread, identification of county-level characteristics associated with the presence of CS in affected areas could help identify and target areas for enhanced CS prevention efforts, maximizing limited resources. We examined the association between CS and several county-level socioeconomic and health demographics. In addition, we developed risk scores to predict areas at elevated risk for CS where enhanced CS prevention efforts should be targeted.


To identify counties at elevated risk for CS in the future, we developed a predictive model using 2014–2015 reported CS case data, then used this model to develop an index of county-level risk scores. Model building and risk score development had 4 distinct steps: (1) selecting and refining predictor variables; (2) developing and fitting a predictive model to identify county-level factors associated with reporting a CS case in 2014 to 2015; (3) creating an index of sensitivities and specificities to select a risk score cutoff; and (4) performing projective predictions using the predictive model and examining how well the model predicted future CS cases by comparing these with actual cases found in 2016 to 2017. All analyses were performed using SAS software (Cary, NC).

Measures and Data Sources

This analysis is inclusive of all 3141 US counties and county equivalents such as parishes and boroughs. The outcome of interest was defined as a county having at least one CS case reported to the National Notifiable Diseases Surveillance System in 2014 to 2015, which receives case reports of nationally notifiable sexually transmitted diseases from all states.14 Given the rarity of CS, we combined county CS data across a 2-year period to increase sample size and improve our ability to detect true associations.

Initial predictor variables of interest included factors that have been associated with syphilis in previous analyses (population proportions of non-Hispanic black persons, Hispanic persons, uninsured persons, persons living in urban areas, children in single-parent households, and children living in poverty).1,6,15–17 We also included other county-level factors such as population size, the presence or absence of a metropolitan area, violent crime rate (number of reported crimes per 100,000 persons), 2015 P&S syphilis rates (cases per 100,000) among female individuals and men who have sex with men (MSM), and income inequality as measured by the 80:20 ratio. The 80:20 ratio refers to the ratio of household income at the 80th percentile relative to household income at the 20th percentile as measured per county.18

County-level socioeconomic and health factor data were obtained from the County Health Rankings and Roadmaps 2015 analytic files, a compilation of data from multiple sources, including the US Census, the Federal Bureau of Investigation's Uniform Crime Report program, and the US Department of Agriculture.13,19 We defined counties with metropolitan or metro areas as those whose rural-urban commuting area codes were between 1 and 3.19 In addition, we defined proportions of persons living in urban areas using the Census Bureau's definition: any population, housing, or territory outside urban areas. County-level rates of P&S syphilis for female individuals and P&S syphilis case-counts among MSM were obtained from the National Notifiable Diseases Surveillance System.

Selecting and Refining Predictor Variables and Predictive Model Building

We anticipated that some of the 12 county-level variables initially selected for inclusion in the model might be measuring the same underlying concept. As such, we assessed all variables for multicollinearity using Pearson correlation coefficients. Factors with correlation coefficients ≤−0.50 or ≥0.50, suggesting moderate to strong correlation, were flagged for further evaluation.20 Income inequality, population proportions of Hispanic, non-Hispanic black, uninsured, children in poverty, and children in single-parent households had correlation coefficients ≥0.50, which suggested collinearity concerns (data not shown). Specifically, the population proportion of children in poverty was moderately to highly correlated with 4 other variables: income inequality, population proportion non-Hispanic black, population proportion uninsured, and proportion of children in single-parent households.

We further assessed and corrected for multicollinearity by calculating variance inflation factors (VIFs) and removing variables with VIFs greater than 5.0 from the analysis.21 Both population proportions of children in poverty and children in single-parent households had VIFs greater than 5.0. Given similarities seen between these 2 variables during the calculation of Pearson coefficients, children in poverty were removed from logistic regression analyses. Upon removal, children in single-parent households attained a VIF of less than 5.0. Following multicollinearity checks, 11 county-level factors of interest remained for possible inclusion in the predictive model. We then stratified the remaining 11 variables by the outcome of interest—presence or absence of CS at the county level in 2014 to 2015—and used likelihood ratio and Wilcoxon rank sum tests to assess differences between groups.

In preparation for model building, we examined the distribution of each continuous variable for skewness and determined that some variables may be improved by transformation. To decide which of the 10 continuous variables required transformation, we fit 2 models for each variable: one model with the CS outcome of interest regressed against the continuous county-level predictor variable and one model with the CS outcome of interest regressed against a dichotomized version of each county-level predictor variable split at its median value. For each variable, these 2 models were compared with one another, and the model with the lower Bayesian information criterion statistic was selected. Those factors that performed better with a median-split transformation included female P&S syphilis rate, MSM P&S syphilis rate, violent crime rate, and population proportions of non-Hispanic black and Hispanic. These 5 variables were retained as dichotomous for future model building.

We then used stepwise variable selection to build a multivariable model and confirmed goodness of fit using receiving operating characteristic curve and area under the curve (AUC) statistics. In total, 7 variables were retained in the final model.

Creating Risk Scores to Identify Counties at Risk for CS

After the development of an initial model, we sought to use this model to create an index of risk scores that could identify counties at risk of reporting CS in the future. We used methods similar to Stein et al.22 to develop risk scores. First, we created a predictor score for each variable in the model by transforming each variable's β coefficient: first multiplying it by 2 and then rounding it to the nearest integer.22 The variables that were previously included in the model as continuous variables were recategorized into tertiles to aid in risk score development. We then input county-level data and summed county-specific predictor scores to produce an overall risk score for each county. A list of risk scores and estimated risk was generated for all US counties (Supplemental Table 1,

We determined a priori that we wanted to identify a risk score cutoff that balanced 2 important characteristics: the ability to identify at-risk counties with a sensitivity and specificity closest to 85% and the ability to identify the lowest number of counties as being at elevated risk for CS. To identify an ideal risk score cutoff, we then calculated the sensitivity, specificity, and number of counties identified as being at elevated risk for CS at each cutoff.

Using the Predictive Model to Predict Future CS Cases

To determine how well the predictive model predicted future CS cases, we used the predictive model that analyzed CS cases in 2014 to 2015 to predict counties that will be at elevated risk for CS and depicted these projections on a US map. Next, we then compared these predictions to actual CS cases that occurred in 2016 to 2017. We estimated the predictive performance using the AUC value and in which a value of ≥80% was considered excellent.


Among all US counties (n = 3141), a total of 254 (8.1%) counties reported at least one CS case in 2014 to 2015 (Table 1); one CS case was reported with an unknown county. Using Wilcoxon rank sum to compare counties without CS with counties with CS, counties with CS had a higher mean of the following: income inequality (4.8 vs. 4.5, P < 0.001); proportions of non-Hispanic black (17.5% vs. 8.1%, P < 0.001), Hispanic (15.8% vs. 8.2%, P < 0.001), uninsured (22.8% vs. 21.2%, P < 0.001), and children in single-parent households (37.0% vs. 31.7%, P < 0.001); and violent crime per 100,000 persons (446.1 vs. 230.5, P < 0.001). Children in poverty (25.2% vs. 24.4%, P = 0.269) was the only factor that was not statistically significant.

Socioeconomic Factors by the Presence of Any Congenital Syphilis Case at the County Level, 2014 to 2015 (n = 3141 US Counties)

Rates of P&S syphilis for female individuals and case-counts for MSM differed significantly between groups. Compared with counties without CS, counties with CS had higher mean P&S syphilis rates for female individuals (66.1 vs. 11.3 cases per 100,000, P < 0.001), MSM P&S syphilis rates (61.9 vs. 1.9, P < 0.001), and metro areas (84.2% vs. 66.9%, P < 0.001); had populations that were >44,961 people (86.6% vs. 29.4%, P < 0.001); and had population proportions of those living in urban areas that were >24.1% (96.4% vs. 64.4%, P < 0.001).

Predictive Model

Preliminary simple logistic regression models revealed that several county-level factors were associated with having a reported CS case. Significant factors included counties that were above-median income inequality, female and MSM P&S syphilis rates, and counties with populations of non-Hispanic blacks, Hispanics, and uninsured. In addition, counties that had a proportion of children in single-parent households between 27.6% and 62.9%, population sizes greater than 15,023 people, proportions of population living in urban areas >24.1%, a violent crime rate higher than 139.6 per 100,000, and a metropolitan area were associated with CS (Table 2). Upon further refining of the model, we found that after controlling for other factors the following were positively associated with a county having a CS case: income inequality, metro area, female and MSM P&S syphilis rates, urban population, non-Hispanic black, and uninsured. The predictive model resulted in an AUC value of 90.3%.

Simple and Adjusted Regression Models for Any Congenital Syphilis Case at the County Level in the United States, 2014 to 2015 (n = 3141 US Counties)

Predictor Scores, Weighted Risk Scores, and Risk Score Cutoffs

Possible predictor scores for county-level factors ranged from 0 to 2. Weighted risk scores for counties ranged from 0 to 10 (Table 3). The highest possible risk score of 10 represented a county that had income inequality above the median; a population size of ≥44,961; population proportions of Hispanic, non-Hispanic black, and uninsured above the median; female and MSM P&S syphilis rates above the median; ≤24.1% of people living in urban areas; and a metro area. The risk score cutoff closest to our a priori qualities was ≥6 (sensitivity of 88.1% and specificity of 74.0%). Using this cutoff, the number of counties that were identified as having an elevated risk for CS was 973 (31.0%). A map of counties identified as having an elevated risk for CS using the risk score cutoff of ≥6 are presented in the supplemental figure, in which we found clustering of counties with elevated risk among metro areas (Supplemental Figure 1, Interestingly, several counties that were identified as having a low risk for CS were near counties that had both a high risk for CS and an actual CS case.

Sensitivities and Specificities for Risk Score Cutoffs for Congenital Syphilis Cases at the County Level, 2014 to 2015 (n = 3,141 US Counties)

Projected Prediction of CS Cases Based on More Recent Data

We used the predictive model built that used 2014–2015 CS case data to predict counties at elevated risk for CS in the future (Table 4). Using a risk score cutoff of ≥6, we correctly identified 83.9% of counties that had CS cases as having an elevated risk and identifying 75.4% of counties with no CS cases as low risk in 2016 to 2017 (AUC value, 89.5%) A map of actual and predicted CS cases at the county level based on a risk score cutoff of ≥6 is displayed in Figure 1.

Estimated Risk for Congenital Syphilis Cases at the County Level, 2016 to 2017 (n = 3141 US Counties)
Figure 1
Figure 1:
Map of actual and predicted county congenital syphilis cases based on a risk score cutoff of ≥6, 2016 to 2017.


Using a data-driven approach to investigate county-level factors associated with CS in the United States, we have developed a robust model that predicts with high-performance characteristics which counties were likely to have future reported CS cases. This predictive model performed well as validated by 2016–2017 CS data. As a result, we proffer actionable data to aid health departments and prevention programs in developing and targeting prevention efforts with a more comprehensive understanding.

Our predictive model identified several county-level factors that were positively associated with reported CS cases at the county-level: population proportions of Hispanics, non-Hispanic blacks, or uninsured above the median; income inequality and female and MSM P&S syphilis rates above the median; having an urban population proportion of greater than 24.1%; and having a metro area. The identification of these factors that are associated with county CS cases in the United States could help target prevention efforts in counties that currently have no reported CS cases but who may be at elevated risk for future CS cases. Importantly, our predictive model performed well when validated using more recent CS data, suggesting that our risk scoring system could be used by local health departments as a tool for assessing their counties' risk for CS. Counties identified as being at elevated risk for CS could focus syphilis prevention and control efforts on women of reproductive age to mitigate CS morbidity and mortality (Supplemental Table, These efforts could include the following: (1) prioritizing all reported reactive syphilis serologic tests among women of reproductive age for health department investigation; (2) improving timely ascertainment of pregnancy status among reported cases of syphilis in women of reproductive age and improving appropriate care referrals for pregnant women; (3) ensuring women in areas of high female syphilis morbidity are screened for syphilis during the third trimester; and (4) expanding outreach and health education to the at-risk community as well as to providers that see women of reproductive age in potentially impacted counties. It is important to note that the lack of access to quality prenatal care services and timely syphilis screening during pregnancy likely are strong predictors of CS; at the time of this analysis, there were no county data available for assessing prenatal care access and for quantifying syphilis screening during pregnancy.

Our analysis is subject to several limitations. First, CS is rare, and in most counties in the United States, there were zero reported cases. This may have reduced our ability to identify statistically significant county-level factors; to minimize this, we combined multiple years of CS data. Second, counties with one CS case were considered equivalent to those with multiple cases across the analytic period, which may have masked some factors associated with having multiple CS cases. Third, because of limitations in data availability, several county-level factors used in this analysis reflect different time periods within the 2009 to 2015 range. Fourth, access to quality prenatal care services and timely syphilis screening during pregnancy are likely strong predictors of CS; at the time of this analysis, there were no county data available for assessing prenatal care access, or for quantifying syphilis screening during pregnancy. Fifth, given our goal was to predict and not necessarily explain CS, our predictive model does not take into consideration possible effect modifiers or confounding factors. Sixth, ecological studies are accurate when the unit of analysis is homogenous. Our unit of analysis in this study, US county, may have a lower level of homogeneity as opposed to other units such as US Census tracts; however, these tracts are often too small, and some measures were not available at the county level. Lastly, and importantly, our analysis was ecological and identified county-level factors associated with CS, which should not be interpreted as modifiable risk factors for CS.

Although preventable through timely screening and adequate treatment, the United States has seen significant increases in CS, which is cause for concern. Findings from this innovative study help provide more information on how the complex relationships of county-level factors that work in tandem with compound individual-level behaviors may increase the risk for CS. Health departments and other prevention programs may consider enhancing overall prevention efforts, especially in areas identified as having an elevated risk for CS.


1. Department of Health and Human Services. Centers for Disease Control and Prevention. Sexually Transmitted Diseases Surveillance. Atlanta, GA, 2015:2016.
2. Ingraham NR Jr. The value of penicillin alone in the prevention and treatment of congenital syphilis. Acta Derm Venereol Suppl (Stockh) 1950; 31(Suppl. 24):60–87.
3. Verghese VP, Hendson L, Singh A, et al. Early childhood neurodevelopmental outcomes in infants exposed to infectious syphilis in utero. Pediatr Infect Dis J 2018; 37:576–579.
4. Alexander JM, Sheffield JS, Sanchez PJ, et al. Efficacy of treatment for syphilis in pregnancy. Obstet Gynecol 1999; 93:5–8.
5. Peterman TA, Su J, Bernstein KT, et al. Syphilis in the United States: On the rise? Expert Rev Anti Infect Ther 2015; 13:161–168.
6. Bernstein KT, Grey JA, Bolan G, et al. Developing a topology of syphilis in the United States. Sex Transm Dis 2018; 45(9S Suppl 1):S1–S6.
7. Mascola L, Pelosi R, Blount JH, et al. Congenital syphilis. Why is it still occurring? JAMA 1984; 252:1719–1722.
8. Desenclos JC, Scaggs M, Wroten JE. Characteristics of mothers of live infants with congenital syphilis in Florida, 1987–1989. Am J Epidemiol 1992; 136:657–661.
9. Mobley JA, McKeown RE, Jackson KL, et al. Risk factors for congenital syphilis in infants of women with syphilis in South Carolina. Am J Public Health 1998; 88:597–602.
10. Risser WL, Hwang LY. Congenital syphilis in Harris County, Texas, USA, 1990–92: Incidence, causes and risk factors. Int J STD AIDS 1997; 8:95–101.
11. Warner L, Rochat RW, Fichtner RR, et al. Missed opportunities for congenital syphilis prevention in an urban southeastern hospital. Sex Transm Dis 2001; 28:92–98.
12. Kidd S, Bowen VB, Torrone EA, et al. Use of national syphilis surveillance data to develop a congenital syphilis prevention cascade and estimate the number of potential congenital syphilis cases averted. Sex Transm Dis 2018; 45(9S Suppl 1):S23–S28.
13. Robert Wood Johnson Foundation. 2015 County Health Rankings & Roadmaps. Available at: Accessed February 15, 2017.
14. Centers for Disease Control and Prevention. In: National Notifiable Diseases Surveillance System (NNDS). Atlanta, GA: US Department of Health and Human Services, CDC, 2017.
15. Raphael D. Introduction to the Social Determinants of health. Social Determinants of Health: Canadian Perspectives. Toronto, Canada: Canadian Scholars' Press, 2004.
16. Sharpe TT, Voute C, Rose MA, et al. Social determinants of HIV/AIDS and sexually transmitted diseases among black women: Implications for health equity. J Womens Health (Larchmt) 2012; 21:249–254.
17. Bailey ZD, Krieger N, Agenor M, et al. Structural racism and health inequities in the USA: Evidence and interventions. Lancet 2017; 389:1453–1463.
18. Glassman B. Income inequality metrics and economic well-being in U.S. metropolitan statistical areas. United States Census Bureau. Available at: Accessed February 8, 2019.
19. United States Departments of Agriculture. Rural-urban commuting area codes. Available at: Accessed February 17, 2017.
20. Hinkle DE, Wiersma W, Jurs SG. Applied Statistics for the Behavioral Sciences, 5th ed. Boston: Houghton Mifflin, 2003.
21. O'Brien RM. A caution regarding rules of thumb for variance inflation factors. Qual Quant 2007; 41:673–690.
22. Stein CR, Kaufman JS, Ford CA, et al. Screening young adults for prevalent chlamydial infection in community settings. Ann Epidemiol 2008; 18:560–571.

Supplemental Digital Content

Copyright © 2020 American Sexually Transmitted Diseases Association. All rights reserved.