Ontology engineering
Ontology engineering and semantic interoperability address the practical problem of turning hundreds of independently developed biomedical ontologies into an infrastructure that can be queried, reasoned over, and combined at scale. Our group designs architectures for processing large, heterogeneous datasets using Semantic Web standards, with a particular emphasis on bringing automated reasoning into routine data-access workflows. The angle taken at KAUST is engineering-led: we treat ontology-based data access as a service that must be fast enough for interactive use, expressive enough to exploit OWL semantics, and robust against the inconsistencies that inevitably arise when many ontologies are combined.
AberOWL and reasoning as a service
The AberOWL: an ontology portal with OWL EL reasoning infrastructure was developed to make OWL EL reasoning available as a routine query primitive over hundreds of bio-ontologies. Aber-OWL: a framework for ontology-based data access in biology set out the underlying architecture, and Using Aber-OWL for fast and scalable reasoning over BioPortal ontologies together with Experiences with Aber-OWL, an Ontology Repository with OWL EL Reasoning showed that reasoning at repository scale is tractable when the right OWL fragment is targeted. The recent Evaluating Different Methods for Semantic Reasoning Over Ontologies revisits these choices in light of newer reasoners. Building on this stack, Vec2SPARQL: integrating SPARQL queries and knowledge graph embeddings and Using SPARQL to Unify Queries over Data, Ontologies, and Machine Learning Models in the PhenomeBrowser Knowledgebase extend SPARQL endpoints with embedding-based similarity, so that vector-space queries and logical queries can be expressed in a common language. DeepGOWeb: fast and accurate protein function prediction on the (Semantic) Web shows how the same approach exposes machine-learning models themselves as Semantic Web resources.
Interoperability across distributed knowledge
A second strand of work targets the interoperability of ontologies and the data annotated with them. Interoperability between biomedical ontologies through relation expansion, upper-level ontologies and automatic reasoning and A common layer of interoperability for biomedical ontologies based on OWL EL introduced reasoning-driven mechanisms for aligning content across ontologies, while Interoperability between phenotype and anatomy ontologies and Quantitative evaluation of ontology design patterns for combining pathology and anatomy ontologies evaluated these design patterns empirically. The RICORDO approach to semantic interoperability for biomedical data and models and An infrastructure for ontology-based information systems in biomedicine: RICORDO case study applied the same ideas to physiology data and models. Standards-level contributions include FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation, FoodOn: a harmonized food ontology to increase global food traceability, quality control and data integration, and The Semanticscience Integrated Ontology (SIO) for biomedical research and knowledge discovery. The recent An open source knowledge graph ecosystem for the life sciences documents how these standards can be assembled into an integrated translational-research substrate.
The expressiveness gained from formal axioms is not free: combining ontologies tends to introduce contradictions and unintended entailments. Towards semantic interoperability: finding and repairing hidden contradictions in biomedical ontologies and To MIREOT or not to MIREOT? A case study of the impact of using MIREOT in the Experimental Factor Ontology (EFO) quantify these effects and propose semi-automatic repair strategies, while Formal axioms in biomedical ontologies improve analysis and interpretation of associated data demonstrates that the additional axiomatic content nevertheless pays for itself in downstream analysis. Large-Scale Reasoning over Functions in Biomedical Ontologies tackles the related problem of reasoning over function representations at the scale of the OBO Foundry, and SPARQL2OWL: Towards Bridging the Semantic Gap Between RDF and OWL addresses the boundary between graph and description-logic semantics. Explanation-oriented tooling such as Klarigi: Characteristic explanations for semantic biomedical data rounds out the engineering toolkit.
These methods are delivered through AberOWL, vec2SPARQL, Onto2Graph, UNMIREOT, OntoFunc, and the mOWL library, and underpin our work on explainable machine learning with biomedical ontologies. The same infrastructure is reused across projects on variant prioritization, microbial cell factories, functional metagenomics, and the Bio2Vec analytics platform.
Projects
Software
- mOWL — Python library for machine learning with biomedical ontologies. Unifies projection-, axiom- and geometric-embedding methods (EL Embeddings, ELBE, BoxSquaredEL, OWL2Vec*, DL2Vec, OPA2Vec) behind one API, with first-class OWLAPI access and PyTorch integration.
- EL Embeddings — Reference implementation of geometric embeddings for the EL++ description logic, the predecessor of GeometrE and BoxSquaredEL. Preserves subsumption reasoning by mapping classes to convex regions.
- catE — Category-theoretic, lattice-preserving embedding of ALC description-logic ontologies that retains the consequence-closure semantics of the original theory.
- DELE — Deductive EL embeddings: enrich training data with the deductive closure of an ontology before learning, so embeddings recover entailment rather than only asserted axioms.
- OPA2Vec — Combines ontology axioms with associated annotation properties (labels, synonyms, definitions) into a single corpus, then trains Word2Vec to produce semantically rich vectors for ontology classes.
- Onto2Vec — Representation learning for ontologies and their annotations by treating logical axioms as natural-language sentences; predecessor of OPA2Vec.
- DL2Vec — Encodes description-logic axioms as a directed graph and learns embeddings via random walks; widely used for downstream gene-disease and protein-function prediction.
- Walking RDF and OWL — Original feature-learning method over RDF graphs and OWL ontologies via biased random walks; the seed implementation for many later embedding methods including OWL2Vec*.
- EL2Box — Box-shaped geometric embeddings for EL++ that strengthen the topological guarantees of EL Embeddings.
- Interpretable Learning — Generates interpretable symbolic rules from learned representations over biomedical knowledge bases.
- AberOWL — Ontology repository delivering OWL EL reasoning as a service: stores hundreds of bio-ontologies, exposes SPARQL with class-expression query expansion and powers semantic search over PubMed/PMC.
- Onto2Graph — Generates entailment-aware graph projections of OWL ontologies suitable for downstream graph machine learning while preserving the axioms' deductive structure.
- UNMIREOT — Identifies, diagnoses and semi-automatically repairs hidden contradictions and unsatisfiable classes introduced by partial imports (MIREOT) into biomedical ontologies.
- OntoFunc — Ontology-driven enrichment analysis that supports arbitrary OWL ontologies and full subsumption-aware aggregation, not only GO.
- vec2SPARQL — Adds embedding-similarity functions to a SPARQL endpoint so that vector-space queries (k-nearest neighbours, cosine similarity) can be mixed with classical graph patterns.
- Units of Measurement Ontology (UO) — OBO Foundry ontology of units of measurement; aligned with QUDT and used across biomedical data standards.
- PhenomeNet — Cross-species phenotype ontology and similarity network combining HPO, MPO, ZP and others; the substrate behind PhenomeNET-VP and DeepPheno.
Publications (45)
- (2026) Mashkova, Zhapa-Camacho, Hoehndorf. DELE: Deductive EL++ Embeddings for Knowledge Base Completion Neurosymbolic Artificial Intelligence.
- (2026) Song, Ma, Liu, Luo et al.. Robust Knowledge Graph Embedding via Denoising The Semantic Web -- ESWC 2026.
- (2026) Zhapa-Camacho, Hoehndorf. Fully Geometric Multi-hop Reasoning on Knowledge Graphs with Transitive Relations The Semantic Web -- ESWC 2026.
- (2024) Callahan, Tripodi, Stefanski, Cappelletti et al.. An open source knowledge graph ecosystem for the life sciences Scientific Data.
- (2023) Luke T. Slater, John A. Williams, Paul N. Schofield, Sophie Russell et al.. Klarigi: Characteristic explanations for semantic biomedical data Computers in Biology and Medicine.
- (2023) Fernando Zhapa-Camacho, Robert Hoehndorf. Evaluating Different Methods for Semantic Reasoning Over Ontologies Joint Proceedings of Scholarly QALD 2023 and SemREC 2023 co-located with 22nd International Semantic Web Conference ISWC 2023, Athens, Greece, November 6-10, 2023.
- (2022) Ali Syed, Senay Kafkas, Maxat Kulmanov, Robert Hoehndorf. Using SPARQL to Unify Queries over Data, Ontologies, and Machine Learning Models in the PhenomeBrowser Knowledgebase Proceedings of the 13th International Conference on Semantic Web Applications and Tools for Health Care and Life Sciences, SWAT4HCLS 2022.
- (2021) Maxat Kulmanov, Fernando Zhapa-Camacho, Robert Hoehndorf. DeepGOWeb: fast and accurate protein function prediction on the (Semantic) Web Nucleic Acids Research.
- (2020) Sara Althubaiti, Senay Kafkas, Marwa Abdelhakim, Robert Hoehndorf. Combining lexical and context features for automatic ontology extension Journal of Biomedical Semantics.
- (2020) Smaili, Gao, Hoehndorf. Formal axioms in biomedical ontologies improve analysis and interpretation of associated data Bioinformatics.
- (2020) Luke T. Slater, Georgios V. Gkoutos, Robert Hoehndorf. Towards semantic interoperability: finding and repairing hidden contradictions in biomedical ontologies BMC Medical Informatics and Decision Making.
- (2020) Rutger A. Vos, Toshiaki Katayama, Hiroyuki Mishima, Shin Kawano et al.. BioHackathon 2015: Semantics of data for life sciences and reproducible research F1000Research.
- (2020) . JOWO 2020: The Joint Ontology Workshops : Proceedings of the Joint Ontology Workshops co-located with the Bolzano Summer of Knowledge (BOSK 2020) CEUR-WS.
- (2019) Sarah M. Alghamdi, Beth A. Sundberg, John P. Sundberg, Paul N. Schofield et al.. Quantitative evaluation of ontology design patterns for combining pathology and anatomy ontologies Scientific Reports.
- (2019) Katayama, Kawashima, Micklem, Kawano et al.. BioHackathon series in 2013 and 2014: improvements of semantic interoperability in life science data and services F1000Research.
- (2018) Maxat Kulmanov, Senay Kafkas, Andreas Karwath, Alexander Malic et al.. Vec2SPARQL: integrating SPARQL queries and knowledge graph embeddings Proceedings of the 11th International Conference Semantic Web Applications and Tools for Life Sciences, SWAT4LS 2018, Antwerp, Belgium, December 3-6, 2018..
- (2018) Damion M. Dooley, Emma J. Griffiths, Gurinder S. Gosal, Pier L. Buttigieg et al.. FoodOn: a harmonized food ontology to increase global food traceability, quality control and data integration Science of Food.
- (2018) Keenan, McKerlie, Gkoutos, Ward et al.. A Review of Current Standards and the Evolution of Histopathology Nomenclature for Laboratory Animals ILAR Journal.
- (2017) Alshahrani, Khan, Maddouri, Kinjo et al.. Neuro-symbolic representation learning on biological knowledge graphs Bioinformatics.
- (2017) Salhi, Negrao, Essack, Morton et al.. DES-TOMATO: A Knowledge Exploration System Focused On Tomato Species Scientific Reports.
Show 25 more
- (2017) Kafkas, Sarntivijai, Hoehndorf. Usage of cell nomenclature in biomedical literature BMC Bioinformatics.
- (2016) Bolleman, Mungall, Strozzi, Baran et al.. FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation Journal of Biomedical Semantics.
- (2016) Salhi, Essack, Radovanovic, Marchand et al.. DESM: portal for microbial knowledge exploration systems Nucleic Acids Research.
- (2016) Slater, Rodriguez-Garcia, O'Shea, Schofield et al.. Experiences with Aber-OWL, an Ontology Repository with OWL EL Reasoning Ontology Engineering: 12th International Experiences and Directions Workshop on OWL, OWLED 2015, co-located with ISWC 2015, Bethlehem, PA, USA, October 9-10, 2015, Revised Selected Papers.
- (2016) Robert Hoehndorf, Liam Mencel, Georgios V. Gkoutos, Paul N. Schofield. Large-Scale Reasoning over Functions in Biomedical Ontologies Formal Ontology in Information Systems.
- (2016) Luke Slater, Georgios V. Gkoutos, Paul N Schofield, Robert Hoehndorf. To MIREOT or not to MIREOT? A case study of the impact of using MIREOT in the Experimental Factor Ontology (EFO) International Conference on Biomedical Ontology and BioCreative (ICBO BioCreative 2016).
- (2016) Mona Alshahrani, Hussein Almashouq, Robert Hoehndorf. SPARQL2OWL: Towards Bridging the Semantic Gap Between RDF and OWL Proceedings of the Joint International Conference on Biological Ontology and BioCreative, Corvallis, Oregon, United States, August 1-4, 2016..
- (2015) Robert Hoehndorf, Luke Slater, Paul N Schofield, Georgios V Gkoutos. Aber-OWL: a framework for ontology-based data access in biology BMC Bioinformatics.
- (2015) Luke Slater, Georgios Gkoutos, Paul N. Schofield, Robert Hoehndorf. Using Aber-OWL for fast and scalable reasoning over BioPortal ontologies Proceedings of International Conference on Biomedical Ontologies (ICBO).
- (2015) Luke Slater, Georgios Gkoutos, Paul N. Schofield, Robert Hoehndorf. AberOWL: an ontology portal with OWL EL reasoning Proceedings of International Conference on Biomedical Ontologies (ICBO).
- (2014) Dumontier, Baker, Baran, Callahan et al.. The Semanticscience Integrated Ontology (SIO) for biomedical research and knowledge discovery Journal of Biomedical Semantics.
- (2013) Hoehndorf, Schofield, Gkoutos. An integrative, translational approach to understanding rare and orphan genetically based diseases Interface Focus.
- (2012) Hoehndorf, Harris, Herre, Rustici et al.. Semantic integration of physiology phenotypes with an application to the Cellular Phenotype Ontology Bioinformatics.
- (2012) Gkoutos, Hoehndorf. Ontology-based cross-species integration and analysis of Saccharomyces cerevisiae phenotypes Journal of Biomedical Semantics.
- (2012) Gkoutos, Schofield, Hoehndorf. The Units Ontology: a tool for integrating units of measurement in science Database.
- (2012) Georgios V. Gkoutos, Paul N. Schofield, Robert Hoehndorf. Chapter Four - The Neurobehavior Ontology: An Ontology for Annotation and Integration of Behavior and Behavioral Phenotypes Bioinformatics of Behavior: Part 1.
- (2012) Robert Hoehndorf, Michel Dumontier, Georgios V. Gkoutos. Integration of knowledge for personalized medicine: a pharmacogenomics case-study Proceedings of the Virtual Physiological Human Conference 2012 (VPH2012).
- (2011) Hiroshi Masuya, Georgios V. Gkoutos, Nobuhiko Tanaka, Kazunori Waki et al.. Investigation of the fundamental strategy for interoperability of description of biological measurements Proceedings of the Second International Conference on Biomedical Ontology.
- Hoehndorf, Oellrich, Rebholz-Schuhmann. Interoperability between phenotype and anatomy ontologies Bioinformatics.
- Robert Hoehndorf, Michel Dumontier, Anika Oellrich, Sarala Wimalaratne et al.. A common layer of interoperability for biomedical ontologies based on OWL EL Bioinformatics.
- Robert Hoehndorf, Michel Dumontier, Anika Oellrich, Dietrich Rebholz-Schuhmann et al.. Interoperability between biomedical ontologies through relation expansion, upper-level ontologies and automatic reasoning PLOS ONE.
- de Bono, Hoehndorf, Wimalaratne, Gkoutos et al.. The RICORDO approach to semantic interoperability for biomedical data and models: strategy, standards and solutions. BMC Research Notes.
- Hoehndorf, Ngonga Ngomo, Pyysalo, Ohta et al.. Ontology design patterns to disambiguate relations between genes and gene products in GENIA Journal of Biomedical Semantics.
- Jupp, Stevens, Hoehndorf. Logical Gene Ontology Annotations (GOAL): exploring gene ontology annotations with OWL Journal of Biomedical Semantics.
- Wimalaratne, Grenon, Hoehndorf, Gkoutos et al.. An infrastructure for ontology-based information systems in biomedicine: RICORDO case study Bioinformatics.