Enabling desert revegetation by AI-tailored soil microbiome fortification

Overview

Roughly twelve million hectares of dryland are lost every year worldwide, and the megaprojects launched to reverse desertification — the Sahel's Great Green Wall is the canonical example — typically fail because most planted trees die when irrigation stops. The biological reason is that desert soils are low in nutrients and high in salinity, and the microbial communities that mediate nitrogen, phosphate, and mineral acquisition for plants are missing or unbalanced. This project, led by Heribert Hirt's plant-microbiome group with our group and the Modelling and Simulation group of Gabriel Wittum and Arne Nägel (Frankfurt) as AI/ML partners, aims to identify what is missing from a given desert soil and design tailored synthetic microbial communities that restore fertility.

The empirical core is a biobank of more than 10,000 microbial strains isolated from over forty sites across the Arabian Peninsula — covering Actinobacteria, Firmicutes, Bacteroidetes, and Alpha-/Beta-/Gamma-Proteobacteria. The project's task is to characterise these strains and their soils functionally (not merely taxonomically), predict which biological functions are absent or under-represented at a given site, and recommend interventions that close those functional gaps. The BORG group is responsible for the AI methods that infer function from sequence and from community composition; the Frankfurt group builds the differential-equation models of microbiome dynamics and uses parameter identification to fit them to field data.

Function prediction at metagenome scale

The methodological problem is that 16S amplicon sequencing identifies who is present but says nothing about function, while whole-metagenome shotgun sequencing gives you function only if the underlying genes can be predicted and annotated accurately — a hard task for the highly novel sequences in environmental samples. Our line of work has progressively closed this gap. Context-based protein function prediction in bacterial genomes (2022) uses genomic neighbourhood (operon structure, gene order) as an additional signal to predict function in bacteria where sequence similarity alone fails — the dominant case for desert isolates. DeepGOMeta: functional insights into microbial communities with deep learning-based protein function prediction (2023) extends this to whole metagenomes, predicting Gene Ontology functions across an assembled or unassembled microbial community and identifying functional differences between samples; this is the tool the project uses to compare fertile and infertile soil types.

Two further advances make the function predictions more reliable. Predicting protein functions using positive-unlabeled ranking with ontology-based priors (2024) reformulates protein-function prediction as a positive-unlabeled problem with ontology-derived priors, which materially improves recall for rare functions — essential because the metabolic functions that distinguish fertile from infertile soil are usually not the high-frequency housekeeping ones. Large-scale knowledge integration for enhanced molecular property prediction (2022) and the survey Ontology embedding: a survey of methods, applications and resources (2018) provide the underlying machinery for embedding the Gene Ontology and other biological ontologies into the prediction models in a way that preserves their logical structure.

The integrated workflow is: sequence soil and biobank strains; predict functions, interactions, and pathways using our deep-learning stack; identify the functional units missing from infertile soils relative to fertile reference soils; identify which strains in the biobank realise the missing functions; and design synthetic microbial communities that fill those gaps. The Frankfurt group then simulates community dynamics under environmental forcing (precipitation, temperature, nutrients) to identify combinations that are stable enough to support plant growth without continuous human intervention. PhD student Rund Tawfiq leads our part of the analysis. The downstream commercialisation route runs through GrowBioM, the NTGC spin-out that formulates the synthetic communities for deployment in Saudi Greening, Red Sea Project, and Mecca Greening initiatives — making this one of the few KAUST AI projects with a direct, non-clinical, environmental impact channel.

Period: 2023–2025

Funding

  • KAUST Near Term Grand Challenge — Grant ID: REI/1/5235-01-01 (CoI) — USD 150,000

Team

  • Heribert Hirt — PI (KAUST (Center for Desert Agriculture))
  • Robert Hoehndorf — CoI (KAUST (Professor of Computer Science))
  • Gabriel Wittum — CoI (Goethe University Frankfurt / KAUST)
  • Arne Nägel — CoI (Goethe University Frankfurt)
  • Rund Tawfiq — PhD (alumnus) (Sano Centre Krakow (Postdoctoral researcher))

Software

Publications acknowledging this project (6)

  • (2024) Predicting protein functions using positive-unlabeled ranking with ontology-based priors Supplementary Material
  • (2022) Context-based protein function prediction in bacterial genomes
  • (2022) INDIGENA: inductive prediction of disease--gene associations using phenotype ontologies Supplementary Material
  • (2022) Large-Scale Knowledge Integration for Enhanced Molecular Property Prediction
  • (2020) PAVS: A database of phenotype-associated variants in Saudi Arabia
  • (2018) Ontology Embedding: A Survey of Methods, Applications and Resources

Topics: Microbial communities, Neuro-symbolic AI, Protein function