Abstract
Assessing the effect of a health-oriented intervention by traditional epidemiological methods is commonly based only on population segments that use healthcare services. Here we introduce a complementary framework for evaluating the impact of a targeted intervention, such as a vaccination campaign against an infectious disease, through a statistical analysis of user-generated content submitted on web platforms. Using supervised learning, we derive a nonlinear regression model for estimating the prevalence of a health event in a population from Internet data. This model is applied to identify control location groups that correlate historically with the areas, where a specific intervention campaign has taken place. We then determine the impact of the intervention by inferring a projection of the disease rates that could have emerged in the absence of a campaign. Our case study focuses on the influenza vaccination program that was launched in England during the 2013/14 season, and our observations consist of millions of geo-located search queries to the Bing search engine and posts on Twitter. The impact estimates derived from the application of the proposed statistical framework support conventional assessments of the campaign.
| Original language | English |
|---|---|
| Pages (from-to) | 1434-1457 |
| Number of pages | 24 |
| Journal | Data Mining and Knowledge Discovery |
| Volume | 29 |
| Issue number | 5 |
| DOIs | |
| State | Published - 22 Sep 2015 |
| Externally published | Yes |
Keywords
- Gaussian Process
- Infectious diseases
- Intervention
- Search query logs
- Social media
- Supervised learning
- User-generated content
All Science Journal Classification (ASJC) codes
- Information Systems
- Computer Science Applications
- Computer Networks and Communications