Ontology engineering
Ontology engineering and semantic interoperability address the practical problem of turning hundreds of independently developed biomedical ontologies into an infrastructure that can be queried, reasoned over, and combined at scale. Our group designs architectures for processing large, heterogeneous datasets using Semantic Web standards, with a particular emphasis on bringing automated reasoning into routine data-access workflows. The angle taken at KAUST is engineering-led: we treat ontology-based data access as a service that must be fast enough for interactive use, expressive enough to exploit OWL semantics, and robust against the inconsistencies that inevitably arise when many ontologies are combined.
AberOWL and reasoning as a service
The AberOWL: an ontology portal with OWL EL reasoning infrastructure was developed to make OWL EL reasoning available as a routine query primitive over hundreds of bio-ontologies. Aber-OWL: a framework for ontology-based data access in biology set out the underlying architecture, and Using Aber-OWL for fast and scalable reasoning over BioPortal ontologies together with Experiences with Aber-OWL, an Ontology Repository with OWL EL Reasoning showed that reasoning at repository scale is tractable when the right OWL fragment is targeted. The recent Evaluating Different Methods for Semantic Reasoning Over Ontologies revisits these choices in light of newer reasoners. Building on this stack, Vec2SPARQL: integrating SPARQL queries and knowledge graph embeddings and Using SPARQL to Unify Queries over Data, Ontologies, and Machine Learning Models in the PhenomeBrowser Knowledgebase extend SPARQL endpoints with embedding-based similarity, so that vector-space queries and logical queries can be expressed in a common language. DeepGOWeb: fast and accurate protein function prediction on the (Semantic) Web shows how the same approach exposes machine-learning models themselves as Semantic Web resources.
Interoperability across distributed knowledge
A second strand of work targets the interoperability of ontologies and the data annotated with them. Interoperability between biomedical ontologies through relation expansion, upper-level ontologies and automatic reasoning and A common layer of interoperability for biomedical ontologies based on OWL EL introduced reasoning-driven mechanisms for aligning content across ontologies, while Interoperability between phenotype and anatomy ontologies and Quantitative evaluation of ontology design patterns for combining pathology and anatomy ontologies evaluated these design patterns empirically. The RICORDO approach to semantic interoperability for biomedical data and models and An infrastructure for ontology-based information systems in biomedicine: RICORDO case study applied the same ideas to physiology data and models. Standards-level contributions include FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation, FoodOn: a harmonized food ontology to increase global food traceability, quality control and data integration, and The Semanticscience Integrated Ontology (SIO) for biomedical research and knowledge discovery. The recent An open source knowledge graph ecosystem for the life sciences documents how these standards can be assembled into an integrated translational-research substrate.
The expressiveness gained from formal axioms is not free: combining ontologies tends to introduce contradictions and unintended entailments. Towards semantic interoperability: finding and repairing hidden contradictions in biomedical ontologies and To MIREOT or not to MIREOT? A case study of the impact of using MIREOT in the Experimental Factor Ontology (EFO) quantify these effects and propose semi-automatic repair strategies, while Formal axioms in biomedical ontologies improve analysis and interpretation of associated data demonstrates that the additional axiomatic content nevertheless pays for itself in downstream analysis. Large-Scale Reasoning over Functions in Biomedical Ontologies tackles the related problem of reasoning over function representations at the scale of the OBO Foundry, and SPARQL2OWL: Towards Bridging the Semantic Gap Between RDF and OWL addresses the boundary between graph and description-logic semantics. Explanation-oriented tooling such as Klarigi: Characteristic explanations for semantic biomedical data rounds out the engineering toolkit.
These methods are delivered through AberOWL, vec2SPARQL, Onto2Graph, UNMIREOT, OntoFunc, and the mOWL library, and underpin our work on explainable machine learning with biomedical ontologies. The same infrastructure is reused across projects on variant prioritization, microbial cell factories, functional metagenomics, and the Bio2Vec analytics platform.
Projects
Software
- mOWL — Python library for machine learning with biomedical ontologies. Unifies projection-, axiom- and geometric-embedding methods (EL Embeddings, ELBE, BoxSquaredEL, OWL2Vec*, DL2Vec, OPA2Vec) behind one API, with first-class OWLAPI access and PyTorch integration.
- EL Embeddings — Reference implementation of geometric embeddings for the EL++ description logic, the predecessor of GeometrE and BoxSquaredEL. Preserves subsumption reasoning by mapping classes to convex regions.
- catE — Category-theoretic, lattice-preserving embedding of ALC description-logic ontologies that retains the consequence-closure semantics of the original theory.
- DELE — Deductive EL embeddings: enrich training data with the deductive closure of an ontology before learning, so embeddings recover entailment rather than only asserted axioms.
- OPA2Vec — Combines ontology axioms with associated annotation properties (labels, synonyms, definitions) into a single corpus, then trains Word2Vec to produce semantically rich vectors for ontology classes.
- Onto2Vec — Representation learning for ontologies and their annotations by treating logical axioms as natural-language sentences; predecessor of OPA2Vec.
- DL2Vec — Encodes description-logic axioms as a directed graph and learns embeddings via random walks; widely used for downstream gene-disease and protein-function prediction.
- Walking RDF and OWL — Original feature-learning method over RDF graphs and OWL ontologies via biased random walks; the seed implementation for many later embedding methods including OWL2Vec*.
- EL2Box — Box-shaped geometric embeddings for EL++ that strengthen the topological guarantees of EL Embeddings.
- Interpretable Learning — Generates interpretable symbolic rules from learned representations over biomedical knowledge bases.
- AberOWL — Ontology repository delivering OWL EL reasoning as a service: stores hundreds of bio-ontologies, exposes SPARQL with class-expression query expansion, and powers semantic search over PubMed/PMC.
- Onto2Graph — Generates entailment-aware graph projections of OWL ontologies suitable for downstream graph machine learning while preserving the axioms' deductive structure.
- UNMIREOT — Identifies, diagnoses, and semi-automatically repairs hidden contradictions and unsatisfiable classes introduced by partial imports (MIREOT) into biomedical ontologies.
- OntoFunc — Ontology-driven enrichment analysis that supports arbitrary OWL ontologies and full subsumption-aware aggregation, not only GO.
- vec2SPARQL — Adds embedding-similarity functions to a SPARQL endpoint so that vector-space queries (k-nearest neighbours, cosine similarity) can be mixed with classical graph patterns.
- Units of Measurement Ontology (UO) — OBO Foundry ontology of units of measurement; aligned with QUDT and used across biomedical data standards.
- PhenomeNet — Cross-species phenotype ontology and similarity network combining HPO, MPO, ZP and others; the substrate behind PhenomeNET-VP and DeepPheno.
Publications (42)
- (2024) Callahan, Tripodi, Stefanski, Cappelletti et al.. An open source knowledge graph ecosystem for the life sciences Scientific Data.
- (2023) Luke T. Slater, John A. Williams, Paul N. Schofield, Sophie Russell et al.. Klarigi: Characteristic explanations for semantic biomedical data Computers in Biology and Medicine.
- (2023) Fernando Zhapa-Camacho, Robert Hoehndorf. Evaluating Different Methods for Semantic Reasoning Over Ontologies Joint Proceedings of Scholarly QALD 2023 and SemREC 2023 co-located with 22nd International Semantic Web Conference ISWC 2023, Athens, Greece, November 6-10, 2023.
- (2022) Ali Syed, Senay Kafkas, Maxat Kulmanov, Robert Hoehndorf. Using SPARQL to Unify Queries over Data, Ontologies, and Machine Learning Models in the PhenomeBrowser Knowledgebase Proceedings of the 13th International Conference on Semantic Web Applications and Tools for Health Care and Life Sciences, SWAT4HCLS 2022.
- (2021) Maxat Kulmanov, Fernando Zhapa-Camacho, Robert Hoehndorf. DeepGOWeb: fast and accurate protein function prediction on the (Semantic) Web Nucleic Acids Research.
- (2020) Smaili, Gao, Hoehndorf. Formal axioms in biomedical ontologies improve analysis and interpretation of associated data Bioinformatics.
- (2020) Sara Althubaiti, Senay Kafkas, Marwa Abdelhakim, Robert Hoehndorf. Combining lexical and context features for automatic ontology extension Journal of Biomedical Semantics.
- (2020) Luke T. Slater, Georgios V. Gkoutos, Robert Hoehndorf. Towards semantic interoperability: finding and repairing hidden contradictions in biomedical ontologies BMC Medical Informatics and Decision Making.
- (2020) Rutger A. Vos, Toshiaki Katayama, Hiroyuki Mishima, Shin Kawano et al.. BioHackathon 2015: Semantics of data for life sciences and reproducible research F1000Research.
- (2020) . JOWO 2020: The Joint Ontology Workshops : Proceedings of the Joint Ontology Workshops co-located with the Bolzano Summer of Knowledge (BOSK 2020) CEUR-WS.
- (2019) Sarah M. Alghamdi, Beth A. Sundberg, John P. Sundberg, Paul N. Schofield et al.. Quantitative evaluation of ontology design patterns for combining pathology and anatomy ontologies Scientific Reports.
- (2019) Katayama, Kawashima, Micklem, Kawano et al.. BioHackathon series in 2013 and 2014: improvements of semantic interoperability in life science data and services F1000Research.
- (2018) Maxat Kulmanov, Senay Kafkas, Andreas Karwath, Alexander Malic et al.. Vec2SPARQL: integrating SPARQL queries and knowledge graph embeddings Proceedings of the 11th International Conference Semantic Web Applications and Tools for Life Sciences, SWAT4LS 2018, Antwerp, Belgium, December 3-6, 2018..
- (2018) Damion M. Dooley, Emma J. Griffiths, Gurinder S. Gosal, Pier L. Buttigieg et al.. FoodOn: a harmonized food ontology to increase global food traceability, quality control and data integration Science of Food.
- (2018) Keenan, McKerlie, Gkoutos, Ward et al.. A Review of Current Standards and the Evolution of Histopathology Nomenclature for Laboratory Animals ILAR Journal.
- (2017) Alshahrani, Khan, Maddouri, Kinjo et al.. Neuro-symbolic representation learning on biological knowledge graphs Bioinformatics.
- (2017) Salhi, Negrao, Essack, Morton et al.. DES-TOMATO: A Knowledge Exploration System Focused On Tomato Species Scientific Reports.
- (2017) Kafkas, Sarntivijai, Hoehndorf. Usage of cell nomenclature in biomedical literature BMC Bioinformatics.
- (2016) Bolleman, Mungall, Strozzi, Baran et al.. FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation Journal of Biomedical Semantics.
- (2016) Salhi, Essack, Radovanovic, Marchand et al.. DESM: portal for microbial knowledge exploration systems Nucleic Acids Research.
- … and 22 more.