Hosts of viruses infecting harmful higher eukaryotes have been well described in the past, whereas the specific hosts of bacteria-infecting viruses are usually unknown. The aims of VIROINF in this work package are to investigate how viruses of Bacteria or Archaea (bacteriophages or just phages) are functionally linked to each other and to their multicellular hosts. This is crucial for understanding how phages impact the microbiome as they act as key regulators of bacterial population size, and can be introduced into natural systems in order to influence species presence. Evidence to date indicates this relationship tends to be highly specific with phages targeting a limited number of bacterial hosts.
VIROINF will, for the first time, address this topic using a combined experimental and computational approach: ESR 5 develops methods to experimentally link phages to their hosts as well as screen and isolate phages that infect specific hosts at the single phage-particle level. A multiple layer database will be generated by ESR 5 and ESR 10 using the above-developed methods containing:
- phage genomes that infect selected hosts;
- viral-tagged metagenomes of phages infecting the same hosts in contaminated groundwater ecosystems;
- community co-occurrence of microbiomes and viromes from the same ecosystems.
This data will be complemented by text-mined literature data from BioRelate, and data from ESR 13. Combined this will comprise many 10,000s of interactions. This dataset will be available for the benchmarking of bioinformatics approaches for inferring host specificity (ESR 10).
ESR 10 will develop machine learning tools to link newly identified bacteriophages to their bacterial host using a range of signals including CRISPR sequences and prophages in bacterial genomes (obtained by ESR 5), k-mer frequencies/genome composition signals, protein domains etc. based on approaches, such as random forests and neural networks/deep learning. Both methods are emerging as two of the most successful machine learning approaches, due to their ability to discover non-linear signals in data and their relative insensitivity to biases in training datasets. This is necessary since known viruses represent only a minority of all viruses, and VIROINF aims to create a scalable tool that predicts a host at the optimal level for a given virus. The knowledge of different levels of virus-host co-evolution that will be generated in WP 2.1 will be applied here. The aim will be to infer networks of probable virus-host relationships at species level. To study virus-host specificity at strain level we will focus on the bacteriophages that infect Staphylococcus aureus. Phages are important mediators of bacterial virulence and hence impact human health. Understanding how bacterial gene content is influenced by this lateral gene transfer mechanism is important for understanding how viruses impact both the specific bacteria and the human host.
- ESR 5: Bacteriophage–host relationships prediction
- ESR 10: Computational prediction of virus-host interactions in the microbiome
- ESR 7: Generation of datasets for phage-bacterial interactions in complex communities and mutational spectra analysis following exposure to selective pressure
- ESR 11: Development of more quantitative viral metagenomics approaches to investigate the role of bacteriophages in the honey bee gut and their effects on health and development
- ESR 14: Computational methods for the analysis of metagenomic datasets to extract viral sequences within the context of commercial data-mining
Benchmarking Bioinformatic Virus Identification Tools Using Real-World Metagenomic Data across Biomes Journal Article
In: bioRxiv, 2023.
Prediction of virus-host association using protein language models and multiple instance learning Journal Article
In: bioRxiv, 2023.
Replidec - Use naive Bayes classifier to identify virus lifecycle from metagenomics data Journal Article
In: bioRxiv, 2022.
ITN -- VIROINF: Understanding (Harmful) Virus-Host Interactions by Linking Virology and Bioinformatics Journal Article
In: Viruses, vol. 13, no. 5, pp. 766, 2021.
Computational prediction of virus-host interactions by pre-trained transformer and Multiple Instance Learning Presentation
Oral presentation at Computational Biology Conference 2022, Glasgow, UK, 01.09.2022.
Replidec - Use naive bayes classifier to identify virus lifecycle from metagenomics data Presentation
Poster at VIRO retreat 2022, Munich, Germany, 01.08.2022.