Despite large ongoing efforts to identify viruses in humans, animals and the environment, there are still multiple challenges remaining in the identification of novel viruses. These challenges remain both in the wet-lab as well as on the bioinformatical side. On the wet-lab side, divergent viruses can now be identified from an infected sample en masse, but different research groups use different protocols to analyse these samples and prepare them for sequencing. All the current approaches result in relative distributions, which lack (not only clinically important) absolute quantitation. Sample preparation, methods and standardisation are prerequisites for meaningful and comparable work. On the bioinformatical side, the most frequently used similarity-based virus identification approaches are largely limited by available databases, which are incomplete and often poorly annotated. VIROINF (together with the European Virus Bioinformatics Center) aims to set up a multiple-layer database for phages and their hosts (WP 1.2).
VIROINF aims to quantify viruses by four approaches (ESR 11):
- spiking in known amounts of selected viruses into a sample;
- developing qPCR assays for a selection of abundant viruses in particular sample sets;
- counting virus particles using flow cytometric approaches in collaboration with Corina Brussaard from the Netherlands Institute for Sea Research; and
- NGS-based quantification developed in collaboration with Baseclear.
The novel wet-lab approaches will be combined with newly developed advanced bioinformatical tools to identify viruses from metagenomic datasets (ESR 12), using sequence-dependent and -independent approaches. Motif searches, k-mer frequencies, and genome organisation are de novo approaches that will complement homology-based methods, allowing the detection of known and unknown viruses, yielding a comprehensive detection pipeline of all types of viruses ranging from well-studied human pathogens to completely novel bacteriophages. A range of available tools for taxonomic classification will be combined into a rapid pipeline.
- ESR 11: Development of more quantitative viral metagenomics approaches to investigate the role of bacteriophages in the honey bee gut and their effects on health and development
- ESR 12: A machine learning pipeline to identify and characterise novel viruses in metagenome data
- ESR 5: Bacteriophage–host relationships prediction
- ESR 7: Generation of datasets for phage-bacterial interactions in complex communities and mutational spectra analysis following exposure to selective pressure
- ESR 13: Functional inferences from colinear crAssphage genomes
- ESR 14: Computational methods for the analysis of metagenomic datasets to extract viral sequences within the context of commercial data-mining