Integrating viral genome sequences and host receptor structure to predict the host range and tissue tropism of RNA and DNA viruses with machine learning

Dr Simon Babayan

We have recently shown that single stranded RNA viruses (e.g. ebolavirus, coronavirus, zikavirus) encode in their genomes signals that point to the taxonomic identity of their reservoir hosts, whether they are transmitted by a vector and if so, the taxonomic identify of that vector using machine learning to analyse the genomic sequence of ~500 viruses (Babayan, Orton, Streicker, 2018 Science). Analogous models may be useful to predict the host range of viruses more broadly (i.e., the other species which might be infected in the future) but require additional sources of information to make the most realistic actionable predictions. The ability of viruses to bind host receptors is a key prerequisite for infection, and as recently shown in the context of SARS-CoV-2’s spike protein, viruses hosts can become ‘pre-adapted’ to the receptors of other related species while still circulating in their natural reservoirs (MacLean et al. 2020, bioRxiv). The expression of appropriate receptors across tissues may further predict the tissue tropism and degree of pathogenicity following infection. Therefore, we predict that combining the genomic signals that underlie the ability of our models to accurately predict reservoir hosts with information on receptor binding could produce models that accurately predict the host range and pathogenicity of diverse viruses.

In this project, we will therefore extend our approach to new virus groups including double-stranded RNA such as bluetongue virus and DNA viruses such as herpes simplex virus and varicella zoster virus. This will form the core of the rotation project.

We will next seek to incorporate structural features of both virus and host receptors, e.g. the secondary protein structure of host receptors, into our models to enrich their predictive capacity and increase our ability to infer both the range of host taxa that viruses could infect and the potential effects of their infection on the organism. Finally, this project will also offer the opportunity to validate model predictions in vitro. The successful candidate will work under the supervision of Simon Babayan and Daniel Streicker, and in close collaboration with the School of Biodiversity, One Health and Veterinary Medicine, and with the MRC-University of Glasgow Centre for Virus Research.