Researchers have suggested that social media and online search data might be used to monitor and predict syphilis and other sexually transmitted diseases. Because people at risk for syphilis might seek sexual health and risk-related information on the internet, we investigated associations between internet state-level search query data (e.g., Google Trends) and reported weekly syphilis cases.
We obtained weekly counts of reported primary and secondary syphilis for 50 states from 2012 to 2014 from the US Centers for Disease Control and Prevention. We collected weekly internet search query data regarding 25 risk-related keywords from 2012 to 2014 for 50 states using Google Trends. We joined 155 weeks of Google Trends data with 1-week lag to weekly syphilis data for a total of 7750 data points. Using the least absolute shrinkage and selection operator, we trained three linear mixed models on the first 10 weeks of each year. We validated models for 2012 and 2014 for the following 52 weeks and the 2014 model for the following 42 weeks.
The models, consisting of different sets of keyword predictors for each year, accurately predicted 144 weeks of primary and secondary syphilis counts for each state, with an overall average R2 of 0.9 and overall average root mean squared error of 4.9.
We used Google Trends search data from the prior week to predict cases of syphilis in the following weeks for each state. Further research could explore how search data could be integrated into public health monitoring systems.