
Dan Liu
My name is Dan Liu and I come from China, I studied software engineering during my bachelor’s in Wuhan, and during my master’s degree in computer science, I joined a computational biology group at CCNU and started to focus on bioinformatics and machine learning. My research interests mainly regarding developing computational algorithms to solve biology problems, and my master’s dissertation was based on logistic matrix factorization with neighborhood regularization and the heterogeneous network to predict virus-host interactions. I am so excited to join CVR to continue my studies in virus-host interactions.
During my Ph.D., I will study virology and work with Prof. David Robertson to figure out virus-host interactions at species level. Our aim is to develop computational approaches to predict virus and host interactions in microbiomes. In addition, I will be working on quantifying the co-phylogenetic signal for testing the preferential host-switching model in bacteria and determining the extent of co-speciation versus host-switching.
In my free time, I am into sports like yoga and badminton, and I also like hiking, traveling and photography.

Field:
Bioinformatics
Host institution:
University of Glasgow (UG), United Kingdom
Local supervisor:
Prof. Dr. David Robertson (UG, Centre for Virus Research)
Local co-supervisor:
Prof. Dr. José R Penadés (UG)
Project partner:
Xue Peng (ESR 5)
Work packages:
WP 1.2 Host prediction
WP 1.3 Virus-host interactions


Project description
Virus-host interactions are highly specific as viruses (including bacteriophages/phages) have co-evolved with the systems they infect and depend on for replication. As a consequence, phylogenetic relatedness and shared genomic properties of host species are a strong predictor of host infectivity. However, our knowledge of host relationships remains sparse and is unknown for many phages, particularly for metagenomically assembled data. In this project, existing resources and genomic signals will be leveraged to infer missing/unobserved and probable hosts, e.g., where infection has not been observed due to host immunity. Machine learning models will be trained on this data to predict phage-bacteria interactions at the species and strain levels.
Specific objectives are:
- Construction of bipartite networks of observed virus-host species links based on databases such as Virus-Host DB and MVP, text-mined literature data from company partner BioRelate, and data from ESR 13. Combined this will comprise many 10,000s of interactions from observed phage-bacteria interactions, and inferred from prophage and CRISPR- Cas associated viral sequences embedded in host genomes.
- Quantification of the co-phylogenetic signal using the software Jane to test the preferential host-switching model in bacteria and to determine the extent of co-speciation versus host-switching. This will give information on how stable phage-host interactions are at the species level.
- Application of combined SVM models to virus-host prediction by using different representations of the virus genome information: nucleotide sequence, amino acid sequence, amino acid properties and protein domains.
- Analysis of virus-host links at the strain level. This objective will use the bacteria Staphylococcus aureus (the area of speciality of José Penades) as a model system to study strain level phage-host interaction prediction in the context of CRISPR-Cas immunity and host pan-genomes.
Journal Articles
Prediction of virus-host association using protein language models and multiple instance learning Journal Article
In: bioRxiv, 2023.
Presentations
Computational prediction of virus-host interactions by pre-trained transformer and Multiple Instance Learning Presentation
Oral presentation at Computational Biology Conference 2022, Glasgow, UK, 01.09.2022.