Editing AI for Epidemiology (section)

== <span style="color: #FFFFFF;">Understanding</span> ==
Epidemiology AI operates at multiple scales: individual risk prediction, community-level disease monitoring, and global outbreak response.

**Disease surveillance**: Traditional disease surveillance relies on reported cases — slow, incomplete, and biased toward severe cases. Digital surveillance uses proxy signals: Google search trends (flu-like searches predict flu incidence), Twitter/Reddit posts (symptom language), pharmacy sales (OTC medication patterns), and HealthMap (news article NLP). These provide near-real-time signals weeks before official reports. COVID-19 demonstrated the power of wastewater surveillance — detecting SARS-CoV-2 RNA in wastewater 7–10 days before clinical case increases.

**Epidemic forecasting**: ML models trained on historical surveillance data, mobility data, climate, and vaccination rates predict epidemic trajectories. The COVID-19 Forecast Hub aggregated predictions from 40+ teams; ensemble models outperformed individual models. Key challenges: epidemics have non-stationary dynamics; behavioral change (lockdowns, masking) shifts transmission; data quality degrades during surges.

**The Google Flu Trends lesson**: Google's 2009 paper predicted flu 1–2 weeks ahead using search query data with impressive early accuracy. By 2012, it was systematically overestimating flu by 2×. The lesson: big data correlations are fragile; search behavior changes (media panic during flu season causes flu-related searches even in healthy people); models trained in one regime fail in another. This is the canonical cautionary tale for digital epidemiology.

**EHR-based risk models**: ML models trained on large EHR databases can predict individual patient risk for flu complications, sepsis, hospital readmission, and chronic disease development. The challenge: many COVID-19 risk models published in 2020 were methodologically flawed (data leakage, inadequate validation, biased training data), and several were explicitly shown to be harmful or useless when deployed. TRIPOD guidelines for prediction model reporting were widely violated.
</div>

<div style="background-color: #8B0000; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">