Computational methods for functional metagenomics: from protein functions to multi-scale interactions
Overview
Multi-scale systems methods for characterising microbial community functions via protein function prediction; downstream produced DeepGOPlus and successor models.
Period: 2022–2024
Funding
- KAUST Competitive Research Grant
— Grant ID:
URF/1/4675-01-01(PI) — USD 247,500
Team
- Robert Hoehndorf — PI (KAUST (Professor of Computer Science))
- Takashi Gojobori — CoI (KAUST (CBRC))
- Maxat Kulmanov — PhD (alumnus), Postdoc (KAUST (Research Scientist))
- Rund Tawfiq — PhD (alumnus) (Sano Centre Krakow (Postdoctoral researcher))
- Daulet Toibazar — MSc (alumnus)
- Amal Alhelal — MSc (alumnus)
- Md Nurul Muttakin — MSc (alumnus)
- Shahad Qatan — MSc (alumnus)
- Kexin Niu — MSc (alumnus)
- Asaad Mohammedsaleh — MSc (alumnus)
Software
- DeepGOPlus — Deep learning model for Gene Ontology-based protein function prediction; CNN ensemble over protein sequences and homology. https://github.com/bio-ontology-research-group/deepgoplus
Publications acknowledging this project (16)
- (2025) Lattice-based $\mathcalALC$ ontology embeddings with saturation
- (2024) Predicting protein functions using positive-unlabeled ranking with ontology-based priors Supplementary Material
- (2024) Neuro-symbolic AI in Life Sciences
- (2023) DeepGOMeta: Functional Insights into Microbial Communities with Deep Learning-Based Protein Function Prediction
- (2022) Exploring the Use of Ontology Components for Distantly-Supervised Disease and Phenotype Named Entity Recognition
- (2022) Context-based protein function prediction in bacterial genomes
- (2022) INDIGENA: inductive prediction of disease--gene associations using phenotype ontologies Supplementary Material
- (2022) mOWL: revision document
- (2022) Large-Scale Knowledge Integration for Enhanced Molecular Property Prediction
- (2018) Ontology Embedding: A Survey of Methods, Applications and Resources
- (2015) The application of Large Language Models to the phenotype-based prioritization of causative genes in rare disease patients
- (2015) The application of Large Language Models to the phenotype-based prioritization of causative genes in rare disease patients
- (2012) Exploring the Use of Ontology Components for Distantly-Supervised Disease and Phenotype Named Entity Recognition
- (2012) Improving the classification of cardinality phenotypes using collections
- (2012) STARVar: Symptom-based Tool for Automatic Ranking of Variants using evidence from literature and genomes
- … and 1 more.
Topics: Applied Ontology, Microbial communities, Neuro-symbolic AI, Protein function