Development of Algorithms for Biotechnology and Biomedical Applications
Overview
This work package within the KAUST Computational Bioscience Research Center (CBRC) Center Competitive Funding (CCF) developed algorithms for metabolic modeling of organisms and their interactions with the environment. The motivation was that systems-level questions in biotechnology and biomedicine, such as discovering molecules involved in metabolic pathways with industrial applications, identifying mechanisms of host-microbe interaction in disease, and predicting the response of microbial communities to changing climate, require methods that move beyond single molecules or organisms to the network of interactions within and between organisms as a whole.
Three methodological strands were proposed. First, a precise characterization of the metabolic potential of organisms from their genome, by combining DeepGO-based protein function prediction with axioms from the Gene Ontology and answer-set programming to identify metabolic pathways in newly sequenced organisms. Second, neural-network models that incorporate background knowledge about metabolic and protein-protein interactions, applied first to single organisms and then to multi-organism communities, to characterize endophenotypes and prioritize candidate biological markers and pathways. Third, machine learning algorithms that apply systems-biology principles to communities of organisms or cells by combining qualitative interaction maps with quantitative laws learned from data. Application domains were Red Sea bacteria (for discovery of metabolic pathways with industrial use) and cancer (for understanding the role of metabolic pathways).
The work package delivered DeepGO-Zero, an extension of DeepGO that uses neuro-symbolic zero-shot learning to predict protein functions even for functions for which no protein has yet been experimentally characterized; this addressed a key limitation of the original DeepGO when applied to rarely-observed metabolic functions in newly sequenced organisms. DeepGO and DeepGO-Zero were released via a public web server (deepgo.bio2vec.net) and on GitHub. For protein-protein and metabolic interaction prediction, an ontology-based filter using DeepGO's cellular-component predictions was developed to reduce the number of candidate interactions roughly ten-fold, and a network-structure-aware evaluation framework was designed to assess genome-wide interaction predictions. For network-level modeling, the DeepMOCCA graph-neural-network architecture was developed that exploits both protein-protein and metabolic interactions, evaluated on human metabolic networks integrated from the Virtual Metabolic Human project, the STRING database, and multi-omics datasets including sleep-deprivation metabolomics. To overcome the shortage of large multi-omics datasets for further development, the project also initiated Red Sea seawater sampling for untargeted metabolomics screening.
Period: 2021–2023
Funding
- KAUST Center Competitive Fund
— Grant ID:
URF/1/1976-34-01(PI) — USD 360,000
Team
- Robert Hoehndorf — PI (KAUST (Professor of Computer Science))
- Sumyyah Toonsi — PhD (alumnus)
Publications acknowledging this project (12)
- (2024) Predicting protein functions using positive-unlabeled ranking with ontology-based priors Supplementary Material
- (2023) DeepGOMeta: Functional Insights into Microbial Communities with Deep Learning-Based Protein Function Prediction
- (2022) Exploring the Use of Ontology Components for Distantly-Supervised Disease and Phenotype Named Entity Recognition
- (2022) mOWL: revision document
- (2022) Large-Scale Knowledge Integration for Enhanced Molecular Property Prediction
- (2021) How much do model organism phenotypes contribute to the computational identification of human disease genes?
- (2018) Ontology Embedding: A Survey of Methods, Applications and Resources
- (2015) The application of Large Language Models to the phenotype-based prioritization of causative genes in rare disease patients
- (2015) The application of Large Language Models to the phenotype-based prioritization of causative genes in rare disease patients
- (2012) Exploring the Use of Ontology Components for Distantly-Supervised Disease and Phenotype Named Entity Recognition
- (2012) Improving the classification of cardinality phenotypes using collections
- (2012) STARVar: Symptom-based Tool for Automatic Ranking of Variants using evidence from literature and genomes
Topics: Neuro-symbolic AI