Microbial communities

Microbial communities drive most of the biogeochemical and biomedical processes that sustain life, yet their functional repertoire remains poorly characterised: metagenomic samples are dominated by proteins with no close homolog in curated databases, and most prediction tools have been trained on eukaryotic sequences. Our work in this area lifts single-protein function prediction up to the level of whole microbial communities by combining ontology-aware deep learning with multi-scale systems analysis. The distinctive angle is to treat metagenomes not as bags of genes but as functional systems whose collective behaviour can be inferred, compared, and ultimately engineered.

The central methodological contribution is DeepGOMeta for functional insights into microbial communities using deep learning-based protein function prediction, a Gene Ontology-aware classifier trained on prokaryote-heavy data that assigns functions de novo to proteins regardless of homology. DeepGOMeta enables direct comparison of community function across environments and patient cohorts, sidestepping the homology bottleneck that limits BLAST-based pipelines. Earlier work pursued the same agenda from the bioprospecting side: In silico exploration of Red Sea Bacillus genomes for natural product biosynthetic gene clusters mined extremophile Bacillus paralicheniformis isolates from the Red Sea to identify biosynthetic gene clusters with antimicrobial potential, addressing the urgent need for novel antibiotics against multidrug-resistant pathogens. In parallel, In silico screening for candidate chassis strains of free fatty acid-producing cyanobacteria applied genome-scale screening to identify cyanobacterial chassis suitable for industrial-scale free fatty acid production, framing microbial cell factories as a computationally tractable design problem.

Underpinning these analyses is a long-standing investment in microbial knowledge integration. DESM: portal for microbial knowledge exploration systems assembled dozens of text-mined, topic-specific knowledgebases covering microbial biotechnology, antimicrobial resistance, biofuel production, and bioremediation, providing structured access to evidence scattered across the literature. That infrastructure now feeds into more recent work where genome-scale predictions are placed in environmental context to support hypothesis generation about which organisms perform which functions in which habitats.

The applied programme around these methods spans desert and marine systems. The projects Enabling desert revegetation by AI-tailored soil microbiome fortification and Enabling mangrove restoration by AI-tailored microbiome fortification use functional metagenomics to design microbial inoculants that support plant establishment in degraded Saudi-Arabian habitats, while Metagenomics-based surface prospecting and Computational methods for functional metagenomics: from protein functions to multi-scale interactions develop the predictive pipelines that link sequence to ecosystem-level function. Complementary clinical and industrial applications include Sequencing and computational analysis of MRSA samples and Data integration and ontologies for microbial cell factories, which translate the same functional-prediction toolkit to nosocomial pathogen surveillance and to engineering microbes for sustainable bioproduction.

Projects

Publications (4)