Confronting data sparsity to identify potential sources of Zika virus spillover infection among primates

Picture of Barbara Han
Barbara Han
Picture of Subhabrata Majumdar
Subhabrata Majumdar
Picture of Flavio Calmon
Flavio Calmon
Picture of Benjamin Glicksberg
Benjamin Glicksberg
Picture of Raya Horesh
Raya Horesh
Picture of Abhishek Kumar
Abhishek Kumar
Picture of Elisa von Marschall
Elisa von Marschall
Picture of Dennis Wei
Dennis Wei
Picture of Aleksandra Mojsilović
Aleksandra Mojsilović
Picture of Kush Varshney
Kush Varshney
Published at Epidemics 2019
Teaser image


The recent Zika virus (ZIKV) epidemic in the Americas ranks among the largest outbreaks in modern times. Like other mosquito-borne flaviviruses, ZIKV circulates in sylvatic cycles among primates that can serve as reservoirs of spillover infection to humans. Identifying sylvatic reservoirs is critical to mitigating spillover risk, but relevant surveillance and biological data remain limited for this and most other zoonoses. We confronted this data sparsity by combining a machine learning method, Bayesian multi-label learning, with a multiple imputation method on primate traits. The resulting models distinguished flavivirus-positive primates with 82% accuracy and suggest that species posing the greatest spillover risk are also among the best adapted to human habitations. Given pervasive data sparsity describing animal hosts, and the virtual guarantee of data sparsity in scenarios involving novel or emerging zoonoses, we show that computational methods can be useful in extracting actionable inference from available data to support improved epidemiological response and prevention.