Applied Ontology
Applied ontology, in our group, means using formal representation to make complex phenotypes, functions, and processes amenable to computational analysis across domains. The starting point is the standardization and curation of biological knowledge using ontologies built with explicit logical commitments, and the long-term goal is to produce representations that support both human curators and automated reasoners. Earlier work in this line concentrated on foundational ontologies, including the General Formal Ontology (GFO) and its biological extension GFO-Bio, and on a formal ontology of functions for the life sciences; current work focuses on phenotype representations and on the methodological consequences of design choices made at this foundational level.
Phenotype representation and cross-species integration
A substantial part of our applied-ontology output concerns the representation of phenotypes. Towards improving phenotype representation in OWL argued that the entity-quality formalism can be refined using OWL-based design patterns, and The anatomy of phenotype ontologies: principles, properties and applications articulated the design principles that underpin modern phenotype ontologies. Semantic integration of physiology phenotypes with an application to the Cellular Phenotype Ontology and Ontology-based cross-species integration and analysis of Saccharomyces cerevisiae phenotypes showed how formal definitions enable cross-species inference, and Quantitative comparison of mapping methods between Human and Mammalian Phenotype Ontology evaluated these strategies empirically. The flora phenotype ontology (FLOPO): tool for integrating morphological traits and phenotypes of vascular plants extends the same machinery from biomedicine into plant science, and Improving the classification of cardinality phenotypes using collections revisits the underlying ontological theory for collection-based phenotypes such as polydactyly or supernumerary teeth.
Domain ontologies and ontology design patterns
Beyond phenotypes, the group has contributed to domain ontologies and design-pattern work across biomedicine and adjacent fields. Chapter Four - The Neurobehavior Ontology: An Ontology for Annotation and Integration of Behavior and Behavioral Phenotypes and Best behaviour? Ontologies and the formal description of animal behaviour developed a coherent representation of behavioural phenotypes. FoodOn: a harmonized food ontology to increase global food traceability, quality control and data integration, DermO; an ontology for the description of dermatologic disease, PIDO: The Primary Immunodeficiency Disease Ontology, The RNA Ontology (RNAO): An Ontology for Integrating RNA Sequence and Structure Data, and A comprehensive update on CIDO: the community-based coronavirus infectious disease ontology contribute ontologies for food, dermatology, immunodeficiency, RNA, and coronavirus infection respectively. The Units Ontology: a tool for integrating units of measurement in science and GFVO: the Genomic Feature and Variation Ontology address measurement and genomic feature annotation, while Ontology design patterns to disambiguate relations between genes and gene products in GENIA and Logical Gene Ontology Annotations (GOAL): exploring gene ontology annotations with OWL demonstrate how design patterns clarify existing annotation conventions. Semantic units: organizing knowledge graphs into semantically meaningful units of representation introduces a more recent abstraction for structuring knowledge graphs along ontologically grounded units.
Methodology and reflection
Applied ontology depends on a careful articulation of what ontologies are for, and the group has consistently engaged with that methodological question. The role of ontologies in biological and biomedical research: a functional perspective sets out a pragmatic view of ontology utility, and Evaluation of research in biomedical ontologies proposes evaluation criteria for the field. Higgs bosons, mars missions, and unicorn delusions: How to deal with terms of dubious reference in scientific ontologies takes up the philosophical problem of non-referring terms in realist ontologies, while Semantic similarity and machine learning with ontologies, Datamining with Ontologies, and Notions of similarity for systems biology models connect ontology design to downstream computational use.
These contributions are consumed by tools such as mOWL, AberOWL, OPA2Vec, Onto2Vec, and the cross-species similarity substrate PhenomeNet, and they feed our work on variant prioritization, microbial cell factories, functional metagenomics, and the IBNSINA-QI programme on biomedical-network analysis. The tutorial resources Machine Learning with Ontologies and the mOWL Tutorial make this body of methodology accessible to new users in biomedicine and beyond.
Projects
- Towards sound, complete, and explainable machine learning with biomedical ontologies (CRG11) (2023–2026)
- Computational methods for functional metagenomics: from protein functions to multi-scale interactions (2022–2024)
- IBNSINA-QI: Integrating Biomedical Networks and Semantic Information for Neural network Analysis of Quantitative Information (2021–2023)
- CompleX: Variant Prioritization in Complex Disease (2019–2021)
- Improvement of genetic variant prioritization technology (2018–2019)
- Bio2Vec: Smart analytics infrastructure for the life sciences (2018–2020)
- Data integration and ontologies for microbial cell factories (2016–2018)
Software
- mOWL — Python library for machine learning with biomedical ontologies. Unifies projection-, axiom- and geometric-embedding methods (EL Embeddings, ELBE, BoxSquaredEL, OWL2Vec*, DL2Vec, OPA2Vec) behind one API, with first-class OWLAPI access and PyTorch integration.
- EL Embeddings — Reference implementation of geometric embeddings for the EL++ description logic, the predecessor of GeometrE and BoxSquaredEL. Preserves subsumption reasoning by mapping classes to convex regions.
- catE — Category-theoretic, lattice-preserving embedding of ALC description-logic ontologies that retains the consequence-closure semantics of the original theory.
- DELE — Deductive EL embeddings: enrich training data with the deductive closure of an ontology before learning, so embeddings recover entailment rather than only asserted axioms.
- OPA2Vec — Combines ontology axioms with associated annotation properties (labels, synonyms, definitions) into a single corpus, then trains Word2Vec to produce semantically rich vectors for ontology classes.
- Onto2Vec — Representation learning for ontologies and their annotations by treating logical axioms as natural-language sentences; predecessor of OPA2Vec.
- DL2Vec — Encodes description-logic axioms as a directed graph and learns embeddings via random walks; widely used for downstream gene-disease and protein-function prediction.
- Walking RDF and OWL — Original feature-learning method over RDF graphs and OWL ontologies via biased random walks; the seed implementation for many later embedding methods including OWL2Vec*.
- EL2Box — Box-shaped geometric embeddings for EL++ that strengthen the topological guarantees of EL Embeddings.
- Interpretable Learning — Generates interpretable symbolic rules from learned representations over biomedical knowledge bases.
- AberOWL — Ontology repository delivering OWL EL reasoning as a service: stores hundreds of bio-ontologies, exposes SPARQL with class-expression query expansion, and powers semantic search over PubMed/PMC.
- Onto2Graph — Generates entailment-aware graph projections of OWL ontologies suitable for downstream graph machine learning while preserving the axioms' deductive structure.
- UNMIREOT — Identifies, diagnoses, and semi-automatically repairs hidden contradictions and unsatisfiable classes introduced by partial imports (MIREOT) into biomedical ontologies.
- OntoFunc — Ontology-driven enrichment analysis that supports arbitrary OWL ontologies and full subsumption-aware aggregation, not only GO.
- vec2SPARQL — Adds embedding-similarity functions to a SPARQL endpoint so that vector-space queries (k-nearest neighbours, cosine similarity) can be mixed with classical graph patterns.
- Machine Learning with Ontologies — Companion code and worked examples for the Briefings in Bioinformatics tutorial review; the most-starred repository in the group.
- Ontology Tutorial — Hands-on tutorial that walks new users through OWL, automated reasoning, and ontology-aware data analysis; basis for the AI in Biomedicine summer school.
- mOWL Tutorial — Step-by-step worked notebooks that demonstrate every embedding family in mOWL on protein-function, gene-disease, and ontology-completion tasks.
- Units of Measurement Ontology (UO) — OBO Foundry ontology of units of measurement; aligned with QUDT and used across biomedical data standards.
- PhenomeNet — Cross-species phenotype ontology and similarity network combining HPO, MPO, ZP and others; the substrate behind PhenomeNET-VP and DeepPheno.
Publications (63)
- (2024) Vogt, Kuhn, Hoehndorf. Semantic units: organizing knowledge graphs into semantically meaningful units of representation Journal of Biomedical Semantics.
- (2023) Sarah M. Alghamdi, Robert Hoehndorf. Improving the classification of cardinality phenotypes using collections Journal of Biomedical Semantics.
- (2023) Sumyyah Toonsi, Senay Kafkas, Robert Hoehndorf. Exploring the Use of Ontology Components for Distantly-Supervised Disease and Phenotype Named Entity Recognition Proceedings of the International Conference on Biomedical Ontologies 2023 together with the Workshop on Ontologies for Infectious and Immune-Mediated Disease Data Science (OIIDDS 2023) and the FAIR Ontology Harmonization and TRUST Data Interoperability Workshop (FOHTI 2023), Bras\', Brazil, August 28 - September 1, 2023.
- (2023) N\'uria Queralt-Rosinach, Paul N. Schofield, Marco Roos, Robert Hoehndorf. Updating the CEMO ontology for future epidemiological challenges 14th International Conference on Semantic Web Applications and Tools for Health Care and Life Sciences (SWAT4HCLS 2023), Basel, Switzerland, February 13-16, 2023.
- (2022) Yongqun He, Hong Yu, Anthony Huffman, Asiyah Yu Lin et al.. A comprehensive update on CIDO: the community-based coronavirus infectious disease ontology Journal of Biomedical Semantics.
- (2022) Ali Syed, Senay Kafkas, Maxat Kulmanov, Robert Hoehndorf. Using SPARQL to Unify Queries over Data, Ontologies, and Machine Learning Models in the PhenomeBrowser Knowledgebase Proceedings of the 13th International Conference on Semantic Web Applications and Tools for Health Care and Life Sciences, SWAT4HCLS 2022.
- (2020) Kulmanov, Smaili, Gao, Hoehndorf. Semantic similarity and machine learning with ontologies Briefings in Bioinformatics.
- (2020) Smaili, Gao, Hoehndorf. Formal axioms in biomedical ontologies improve analysis and interpretation of associated data Bioinformatics.
- (2020) Luke T. Slater, Georgios V. Gkoutos, Robert Hoehndorf. Towards semantic interoperability: finding and repairing hidden contradictions in biomedical ontologies BMC Medical Informatics and Decision Making.
- (2020) . JOWO 2020: The Joint Ontology Workshops : Proceedings of the Joint Ontology Workshops co-located with the Bolzano Summer of Knowledge (BOSK 2020) CEUR-WS.
- (2019) Sarah M. Alghamdi, Beth A. Sundberg, John P. Sundberg, Paul N. Schofield et al.. Quantitative evaluation of ontology design patterns for combining pathology and anatomy ontologies Scientific Reports.
- (2019) Althubaiti, Karwath, Dallol, Noor et al.. Ontology-based prediction of cancer driver genes Scientific Reports.
- (2019) Senay Kafkas, Robert Hoehndorf. Ontology based mining of pathogen--disease associations from literature Journal of Biomedical Semantics.
- (2018) Damion M. Dooley, Emma J. Griffiths, Gurinder S. Gosal, Pier L. Buttigieg et al.. FoodOn: a harmonized food ontology to increase global food traceability, quality control and data integration Science of Food.
- (2018) Ron Henkel, Robert Hoehndorf, Tim Kacprowski, Christian Knupfer et al.. Notions of similarity for systems biology models Briefings in Bioinformatics.
- (2018) Keenan, McKerlie, Gkoutos, Ward et al.. A Review of Current Standards and the Evolution of Histopathology Nomenclature for Laboratory Animals ILAR Journal.
- (2018) Smaili, Gao, Hoehndorf. Onto2Vec: joint vector-based representation of biological entities and their ontology-based annotations Bioinformatics.
- (2018) Georgios V. Gkoutos, Paul N. Schofield, Robert Hoehndorf. The anatomy of phenotype ontologies: principles, properties and applications Briefings in Bioinformatics.
- (2018) Kulmanov, Schofield, Gkoutos, Hoehndorf. Ontology-based validation and identification of regulatory phenotypes Bioinformatics.
- (2018) Senay Kafkas, Robert Hoehndorf. Ontology based mining of pathogen-disease associations from literature Bio-Ontologies COSI.
- … and 43 more.