Neuro-symbolic AI
Neuro-symbolic methods in bioinformatics aim to combine the deductive guarantees of symbolic knowledge with the inductive power of statistical learning. Our group develops methods that map entities described in formal ontologies into vector spaces while preserving the semantic relations expressed by their axioms, so that downstream models can use background knowledge directly in similarity search, link prediction, and classification. The distinctive angle at KAUST is a focus on description logics as the source of structure: we design embedding constructions for languages such as EL++ and ALC that come with mathematical guarantees about the logical theories they approximate, rather than treating ontologies as plain graphs.
Our early work in this area established that knowledge graphs derived from biomedical ontologies can be used as substrates for representation learning. In Neuro-symbolic representation learning on biological knowledge graphs we introduced feature-learning methods that operate over RDF and OWL data, and Onto2Vec showed that treating logical axioms as sentences over class names yields embeddings that capture both formal structure and annotation context. OPA2Vec extended this idea by combining axioms with annotation properties such as labels, synonyms, and natural-language definitions, while Vec2SPARQL: integrating SPARQL queries and knowledge graph embeddings demonstrated how such vectors can be queried alongside symbolic data through standard Semantic Web interfaces. The broader rationale for this programme is set out in Data science and symbolic AI: Synergies, challenges and opportunities and developed further in the recent overview Neuro-Symbolic AI in Life Sciences.
Geometric embeddings for description logics
A core technical line of work is the construction of geometric models of description-logic theories. EL Embeddings: Geometric construction of models for the Description Logic EL++ introduced a family of embeddings in which classes are represented as n-balls and axioms become geometric constraints, so that satisfying the constraints corresponds to constructing a model of the theory. More recent work, including Enhancing Geometric Ontology Embeddings for EL++ with Negative Sampling and Deductive Closure Filtering and Lattice-Preserving ALC Ontology Embeddings, extends these ideas to more expressive logics and tightens the relationship between embedding geometry, logical entailment, and the lattice of concepts. From Axioms over Graphs to Vectors, and Back Again: Evaluating the Properties of Graph-based Ontology Embeddings systematically compares projection-based and axiom-aware approaches; the Ontology Embedding: A Survey of Methods, Applications and Resources consolidates the now-substantial literature on this question.
Applications to function and variant prediction
Neuro-symbolic models translate naturally into biomedical prediction tasks where the label space itself is an ontology. The DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier and DeepGOPlus: improved protein function prediction from sequence families showed how Gene Ontology structure can be exploited inside neural classifiers, while DeepGOZero: improving protein function prediction from sequence and zero-shot learning based on ontology axioms used EL embeddings to predict functions for classes that were never observed during training. Predicting protein functions using positive-unlabeled ranking with ontology-based priors addresses the partial-annotation problem with a learning theory that respects the open-world semantics of GO. In parallel, DeepPheno: Predicting single gene loss-of-function phenotypes using an ontology-aware hierarchical classifier, DeepPVP: phenotype-based prioritization of causative variants using deep learning, and Prioritizing genomic variants through neuro-symbolic, knowledge-enhanced learning apply the same philosophy to phenotype and variant prioritization, and Combining biomedical knowledge graphs and text to improve predictions for drug-target interactions and drug-indications illustrates the approach for drug discovery.
Most of these methods are released through the mOWL: Python library for machine learning with biomedical ontologies framework, which unifies projection-, axiom-, and geometric-embedding methods behind a single API. The programme is actively supported through KAUST projects on sound, complete, and explainable machine learning with biomedical ontologies, on variant prioritization in complex disease, on personalized cancer treatment prediction, and through participation in the KAUST Center of Excellence for Generative AI and the Smart Health Initiative.
Projects
- KAUST Smart Health Initiative 2024 round (SHI2024) (2025–ongoing)
- Personalized cancer treatment prediction (KCSH Pathway to Impact 2025) (2025–2026)
- KAUST Center of Excellence for Generative AI (Health and Wellness, BCB theme) (2024–ongoing)
- Towards sound, complete, and explainable machine learning with biomedical ontologies (CRG11) (2023–2026)
- Disease Models from Patient-derived Leukemic Cells in Biomimetic Peptide Scaffolds for Precision Medicine Applications (2023–2026)
- A public Saudi pangenome as reference for genomics in the Middle East (2024–2026)
- Enabling desert revegetation by AI-tailored soil microbiome fortification (2023–2025)
- Enabling mangrove restoration by AI-tailored microbiome fortification (2023–2023)
- Computational methods for functional metagenomics: from protein functions to multi-scale interactions (2022–2024)
- IBNSINA-QI: Integrating Biomedical Networks and Semantic Information for Neural network Analysis of Quantitative Information (2021–2023)
- Development of Algorithms for Biotechnology and Biomedical Applications (2021–2023)
- CompleX: Variant Prioritization in Complex Disease (2019–2021)
- Improving health of Saudi population (2019–2021)
- Improvement of genetic variant prioritization technology (2018–2019)
- Bio2Vec: Smart analytics infrastructure for the life sciences (2018–2020)
- Data integration and ontologies for microbial cell factories (2016–2018)
Software
- mOWL — Python library for machine learning with biomedical ontologies. Unifies projection-, axiom- and geometric-embedding methods (EL Embeddings, ELBE, BoxSquaredEL, OWL2Vec*, DL2Vec, OPA2Vec) behind one API, with first-class OWLAPI access and PyTorch integration.
- EL Embeddings — Reference implementation of geometric embeddings for the EL++ description logic, the predecessor of GeometrE and BoxSquaredEL. Preserves subsumption reasoning by mapping classes to convex regions.
- catE — Category-theoretic, lattice-preserving embedding of ALC description-logic ontologies that retains the consequence-closure semantics of the original theory.
- DELE — Deductive EL embeddings: enrich training data with the deductive closure of an ontology before learning, so embeddings recover entailment rather than only asserted axioms.
- OPA2Vec — Combines ontology axioms with associated annotation properties (labels, synonyms, definitions) into a single corpus, then trains Word2Vec to produce semantically rich vectors for ontology classes.
- Onto2Vec — Representation learning for ontologies and their annotations by treating logical axioms as natural-language sentences; predecessor of OPA2Vec.
- DL2Vec — Encodes description-logic axioms as a directed graph and learns embeddings via random walks; widely used for downstream gene-disease and protein-function prediction.
- Walking RDF and OWL — Original feature-learning method over RDF graphs and OWL ontologies via biased random walks; the seed implementation for many later embedding methods including OWL2Vec*.
- EL2Box — Box-shaped geometric embeddings for EL++ that strengthen the topological guarantees of EL Embeddings.
- Interpretable Learning — Generates interpretable symbolic rules from learned representations over biomedical knowledge bases.
- DeepGOPlus — CNN-ensemble protein-function predictor that augments sequence-based scoring with k-nearest-neighbour homology and GO axioms; one of the strongest CAFA-evaluated open models.
- DeepGO — Original sequence-based, ontology-aware deep classifier for predicting Gene Ontology functional annotations; basis of the entire DeepGO family of tools.
- DeepGO2 — Next-generation DeepGO model with transformer protein embeddings and improved hierarchical multi-label prediction.
- DeepGOZero — Zero-shot extension of DeepGO using model-theoretic ELEmbeddings to predict GO classes that have never been observed during training.
- DeepGOMeta — DeepGO trained specifically for metagenomic communities; predicts functional roles of proteins recovered from environmental samples and links them to biogeochemical processes.
- PU-GO — Positive-unlabeled ranking of protein functions with ontology-based priors; directly addresses the partial-annotation problem in CAFA benchmarks.
- DeepPheno — Predicts loss-of-function organism-level phenotypes (HPO/MPO) directly from a gene's annotated functions, using a hierarchical neural classifier over phenotype ontologies.
- GO-Agent — LLM-agent framework that decomposes protein-function prediction into tool-calling sub-tasks (sequence search, structure lookup, domain reasoning) and stitches the evidence into a final GO annotation.
- PhenoGoCon — Predicts gene–phenotype associations from predicted Gene Ontology functions; bridges GO function prediction and HPO/MPO phenotype prediction.
- Genomic Context — Bacterial protein-function prediction that exploits operon and genome-neighbourhood structure in addition to sequence and homology.
Publications (29)
- (2025) Chen, Mashkova, Zhapa-Camacho, Hoehndorf et al.. Ontology Embedding: A Survey of Methods, Applications and Resources IEEE Transactions on Knowledge and Data Engineering.
- (2025) Hoehndorf, Pesquita, Zhapa-Camacho. Neuro-Symbolic AI in Life Sciences Handbook on Neurosymbolic AI and Knowledge Graphs.
- (2024) Zhapa-Camacho, Tang, Kulmanov, Hoehndorf. Predicting protein functions using positive-unlabeled ranking with ontology-based priors Bioinformatics.
- (2024) Althagafi, Zhapa-Camacho, Hoehndorf. Prioritizing genomic variants through neuro-symbolic, knowledge-enhanced learning Bioinformatics.
- (2024) Ghunaim, Hoehndorf. Large-Scale Knowledge Integration for Enhanced Molecular Property Prediction Neural-Symbolic Learning and Reasoning.
- (2024) Mashkova, Zhapa-Camacho, Hoehndorf. Enhancing Geometric Ontology Embeddings for ^++ with Negative Sampling and Deductive Closure Filtering Neural-Symbolic Learning and Reasoning.
- (2024) Zhapa-Camacho, Hoehndorf. Lattice-Preserving ALC Ontology Embeddings Neural-Symbolic Learning and Reasoning.
- (2023) Fernando Zhapa-Camacho, Maxat Kulmanov, Robert Hoehndorf. mOWL: Python library for machine learning with biomedical ontologies Bioinformatics.
- (2023) Fernando Zhapa-Camacho, Robert Hoehndorf. From Axioms over Graphs to Vectors, and Back Again: Evaluating the Properties of Graph-based Ontology Embeddings Proceedings of the 17th International Workshop on Neural-Symbolic Learning and Reasoning, La Certosa di Pontignano, Siena, Italy, July 3-5, 2023.
- (2022) Mona Alshahrani, Abdullah Almansour, Asma Alkhaldi, Maha A. Thafar et al.. Combining biomedical knowledge graphs and text to improve predictions for drug-target interactions and drug-indications PeerJ.
- (2022) Maxat Kulmanov, Robert Hoehndorf. DeepGOZero: improving protein function prediction from sequence and zero-shot learning based on ontology axioms Bioinformatics.
- (2022) Zhenwei Tang, Shichao Pei, Zhao Zhang, Yongchun Zhu et al.. Positive-Unlabeled Learning with Adversarial Data Augmentation for Knowledge Graph Completion Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence.
- (2021) Tilman Hinnerichs, Robert Hoehndorf. DTI-Voodoo: machine learning over interaction networks and ontology-based background knowledge predicts drug interactions Bioinformatics.
- (2020) Kulmanov, Smaili, Gao, Hoehndorf. Semantic similarity and machine learning with ontologies Briefings in Bioinformatics.
- (2020) Kulmanov, Hoehndorf. DeepGOPlus: improved protein function prediction from sequence Bioinformatics.
- (2020) Kulmanov, Hoehndorf. DeepPheno: Predicting single gene loss-of-function phenotypes using an ontology-aware hierarchical classifier PLOS Computational Biology.
- (2019) Claus Weiland, Maxat Kulmanov, Marco Schmidt, Robert Hoehndorf. A Machine Learning Based Approach for Similarity Search on Biodiversity Knowledge Graphs Biodiversity Information Science and Standards.
- (2019) Boudellioua, Kulmanov, Schofield, Gkoutos et al.. DeepPVP: phenotype-based prioritization of causative variants using deep learning BMC Bioinformatics.
- (2019) Maxat Kulmanov, Wang Liu-Wei, Yuan Yan, Robert Hoehndorf. EL Embeddings: Geometric construction of models for the Description Logic EL++ Proceedings of IJCAI 2019.
- (2019) Pei, Yu, Hoehndorf, Zhang. Semi-Supervised Entity Alignment via Knowledge Graph Embedding with Awareness of Degree Difference The World Wide Web Conference.
- … and 9 more.