Research » WP 1.2 Host prediction

Hosts of viruses infecting harmful higher eukaryotes have been well described in the past, whereas the specific hosts of bacteria-infecting viruses are usually unknown. The aims of VIROINF in this work package are to investigate how viruses of Bacteria or Archaea (bacteriophages or just phages) are functionally linked to each other and to their multicellular hosts. This is crucial for understanding how phages impact the microbiome as they act as key regulators of bacterial population size, and can be introduced into natural systems in order to influence species presence. Evidence to date indicates this relationship tends to be highly specific with phages targeting a limited number of bacterial hosts.

VIROINF will, for the first time, address this topic using a combined experimental and computational approach: ESR 5 develops methods to experimentally link phages to their hosts as well as screen and isolate phages that infect specific hosts at the single phage-particle level. A multiple layer database will be generated by ESR 5 and ESR 10 using the above-developed methods containing:

  • phage genomes that infect selected hosts;
  • viral-tagged metagenomes of phages infecting the same hosts in contaminated groundwater ecosystems;
  • community co-occurrence of microbiomes and viromes from the same ecosystems.

This data will be complemented by text-mined literature data from BioRelate, and data from ESR 13. Combined this will comprise many 10,000s of interactions. This dataset will be available for the benchmarking of bioinformatics approaches for inferring host specificity (ESR 10).

ESR 10 will develop machine learning tools to link newly identified bacteriophages to their bacterial host using a range of signals including CRISPR sequences and prophages in bacterial genomes (obtained by ESR 5), k-mer frequencies/genome composition signals, protein domains etc. based on approaches, such as random forests and neural networks/deep learning. Both methods are emerging as two of the most successful machine learning approaches, due to their ability to discover non-linear signals in data and their relative insensitivity to biases in training datasets. This is necessary since known viruses represent only a minority of all viruses, and VIROINF aims to create a scalable tool that predicts a host at the optimal level for a given virus. The knowledge of different levels of virus-host co-evolution that will be generated in WP 2.1 will be applied here. The aim will be to infer networks of probable virus-host relationships at species level. To study virus-host specificity at strain level we will focus on the bacteriophages that infect Staphylococcus aureus. Phages are important mediators of bacterial virulence and hence impact human health. Understanding how bacterial gene content is influenced by this lateral gene transfer mechanism is important for understanding how viruses impact both the specific bacteria and the human host.

PhD Projects

Main projects:

Side projects:

Journal Articles

Luo, Shiqi; Ru, Jinlong; Mirzaei, Mohammadali Khan; Xue, Jinling; Peng, Xue; Ralser, Anna; Hadi, Joshua Lemuel; Mejías-Luque, Raquel; Gerhard, Markus; Deng, Li

Helicobacter pylori infection alters gut virome by expanding temperate phages linked to increased risk of colorectal cancer Journal Article

In: Gut, pp. gutjnl-2023-330362, 2023.

Abstract | Links | BibTeX

Luo, Shiqi; Ru, Jinlong; Mirzaei, Mohammadali Khan; Xue, Jinling; Peng, Xue; Ralser, Anna; Luque, Raquel Mejías; Gerhard, Markus; Deng, Li

Gut virome profiling identifies an association between temperate phages and colorectal cancer promoted by Helicobacter pylori infection Journal Article

In: Gut Microbes, vol. 15, iss. 2, pp. 2257291, 2023.

Abstract | Links | BibTeX

Wu, Ling-Yi; Pappas, Nikolaos; Wijesekara, Yasas; Piedade, Gonçalo J.; Brussaard, Corina P. D.; Dutilh, Bas E.

Benchmarking Bioinformatic Virus Identification Tools Using Real-World Metagenomic Data across Biomes Journal Article

In: bioRxiv, 2023.

Abstract | Links | BibTeX

Liu, Dan; Young, Francesca; Robertson, David L.; Yuan, Ke

Prediction of virus-host association using protein language models and multiple instance learning Journal Article

In: bioRxiv, 2023.

Abstract | Links | BibTeX

Ru, Jinlong; Mirzaei, Mohammadali Khan; Xue, Jinling; Peng, Xue; Deng, Li

ViroProfiler: a containerized bioinformatics pipeline for viral metagenomic data analysis Journal Article

In: Gut Microbes, vol. 15, iss. 1, pp. 2192522, 2023.

Abstract | Links | BibTeX

Peng, Xue; Ru, Jinlong; Mirzaei, Mohammadali Khan; Deng, Li

Replidec - Use naive Bayes classifier to identify virus lifecycle from metagenomics data Journal Article

In: bioRxiv, 2022.

Abstract | Links | BibTeX

Goettsch, Winfried; Beerenwinkel, Niko; Deng, Li; Dölken, Lars; Dutilh, Bas E.; Erhard, Florian; Kaderali, Lars; von Kleist, Max; Marquet, Roland; Matthijnssens, Jelle; McCallin, Shawna; McMahon, Dino; Rattei, Thomas; van Rij, Ronald P.; Robertson, David L.; Schwemmle, Martin; Stern-Ginossar, Noam; Marz, Manja

ITN -- VIROINF: Understanding (Harmful) Virus-Host Interactions by Linking Virology and Bioinformatics Journal Article

In: Viruses, vol. 13, no. 5, pp. 766, 2021.

Abstract | Links | BibTeX


Liu, Dan

Computational prediction of virus-host interactions by pre-trained transformer and Multiple Instance Learning Presentation

Oral presentation at Computational Biology Conference 2022, Glasgow, UK, 01.09.2022.

Abstract | Links | BibTeX

Peng, Xue

Replidec - Use naive bayes classifier to identify virus lifecycle from metagenomics data Presentation

Poster at VIRO retreat 2022, Munich, Germany, 01.08.2022.