Ontology engineering

Ontology engineering and semantic interoperability address the practical problem of turning hundreds of independently developed biomedical ontologies into an infrastructure that can be queried, reasoned over, and combined at scale. Our group designs architectures for processing large, heterogeneous datasets using Semantic Web standards, with a particular emphasis on bringing automated reasoning into routine data-access workflows. The angle taken at KAUST is engineering-led: we treat ontology-based data access as a service that must be fast enough for interactive use, expressive enough to exploit OWL semantics, and robust against the inconsistencies that inevitably arise when many ontologies are combined.

AberOWL and reasoning as a service

The AberOWL: an ontology portal with OWL EL reasoning infrastructure was developed to make OWL EL reasoning available as a routine query primitive over hundreds of bio-ontologies. Aber-OWL: a framework for ontology-based data access in biology set out the underlying architecture, and Using Aber-OWL for fast and scalable reasoning over BioPortal ontologies together with Experiences with Aber-OWL, an Ontology Repository with OWL EL Reasoning showed that reasoning at repository scale is tractable when the right OWL fragment is targeted. The recent Evaluating Different Methods for Semantic Reasoning Over Ontologies revisits these choices in light of newer reasoners. Building on this stack, Vec2SPARQL: integrating SPARQL queries and knowledge graph embeddings and Using SPARQL to Unify Queries over Data, Ontologies, and Machine Learning Models in the PhenomeBrowser Knowledgebase extend SPARQL endpoints with embedding-based similarity, so that vector-space queries and logical queries can be expressed in a common language. DeepGOWeb: fast and accurate protein function prediction on the (Semantic) Web shows how the same approach exposes machine-learning models themselves as Semantic Web resources.

Interoperability across distributed knowledge

A second strand of work targets the interoperability of ontologies and the data annotated with them. Interoperability between biomedical ontologies through relation expansion, upper-level ontologies and automatic reasoning and A common layer of interoperability for biomedical ontologies based on OWL EL introduced reasoning-driven mechanisms for aligning content across ontologies, while Interoperability between phenotype and anatomy ontologies and Quantitative evaluation of ontology design patterns for combining pathology and anatomy ontologies evaluated these design patterns empirically. The RICORDO approach to semantic interoperability for biomedical data and models and An infrastructure for ontology-based information systems in biomedicine: RICORDO case study applied the same ideas to physiology data and models. Standards-level contributions include FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation, FoodOn: a harmonized food ontology to increase global food traceability, quality control and data integration, and The Semanticscience Integrated Ontology (SIO) for biomedical research and knowledge discovery. The recent An open source knowledge graph ecosystem for the life sciences documents how these standards can be assembled into an integrated translational-research substrate.

The expressiveness gained from formal axioms is not free: combining ontologies tends to introduce contradictions and unintended entailments. Towards semantic interoperability: finding and repairing hidden contradictions in biomedical ontologies and To MIREOT or not to MIREOT? A case study of the impact of using MIREOT in the Experimental Factor Ontology (EFO) quantify these effects and propose semi-automatic repair strategies, while Formal axioms in biomedical ontologies improve analysis and interpretation of associated data demonstrates that the additional axiomatic content nevertheless pays for itself in downstream analysis. Large-Scale Reasoning over Functions in Biomedical Ontologies tackles the related problem of reasoning over function representations at the scale of the OBO Foundry, and SPARQL2OWL: Towards Bridging the Semantic Gap Between RDF and OWL addresses the boundary between graph and description-logic semantics. Explanation-oriented tooling such as Klarigi: Characteristic explanations for semantic biomedical data rounds out the engineering toolkit.

These methods are delivered through AberOWL, vec2SPARQL, Onto2Graph, UNMIREOT, OntoFunc, and the mOWL library, and underpin our work on explainable machine learning with biomedical ontologies. The same infrastructure is reused across projects on variant prioritization, microbial cell factories, functional metagenomics, and the Bio2Vec analytics platform.

Projects

Software

Publications (42)