Research

BORG works on biomedical ontologies, neuro-symbolic AI, disease and phenotype informatics, and protein-function prediction. The tabs below cover the research topics we work on, the funded projects that support that work, and the open-source software the group produces.

Research topics the Bio-Ontology Research Group works on. Open a topic for the full overview, related projects, software, and publications.

Neuro-symbolic AI

I work on methods that integrate symbolic knowledge with statistical learning. This includes mapping entities in formal ontologies into vector spaces while preserving their semantic relations. I develop embedding frameworks for Description Logics (e.g., EL++ and ALC) that provide mathematical guarantees for logical soundness and approximate the interpretation of formalized theories.

ontology embeddingsdescription logicgeometric embeddingsEL++ALCneuro-symbolic AI
Read more →

Ontology engineering

I develop architectures for processing massive, heterogeneous data using Semantic Web standards. This work includes the AberOWL infrastructure for ontology-based data access and methods for establishing interoperability across distributed databases through linked knowledge graphs.

AberOWLSemantic WebOWLOWL EL reasoningontology-based data accesslinked data
Read more →

Applied Ontology

I use ontologies to standardize and analyze complex phenotypes across domains. My earlier work focused on foundational ontologies, including the General Formal Ontology (GFO) and its biological extension GFO-Bio, as well as the development of a formal ontology of functions to curate functional knowledge in the life sciences.

formal ontologyGFOGFO-Bioontology of functionsphenotype ontologyFLOPO
Read more →

Protein function

Large-scale ontologies like the Gene Ontology (GO) provide essential background knowledge for understanding protein activity. I work on the DeepGO family of systems, which utilize formalized axioms to constrain deep learning models for protein function prediction. These systems are used to derive functional insights from sequence and interaction data.

Gene OntologyGODeepGODeepGOPlusDeepGO-SEDeepGOZero
Read more →

Rare disease

The diagnosis of rare diseases requires the integration of patient-specific data with large-scale background knowledge, such as the Human Phenotype Ontology (HPO). I develop systems like PhenomeNET and PVP that use automated reasoning and machine learning to prioritize disease-causing genomic variants based on their phenotypic consequences.

PhenomeNETDeepPVPPVPvariant prioritizationHuman Phenotype OntologyHPO
Read more →

Drug mechanisms

I apply ontologies and knowledge graphs to model drug-target interactions, drug indications, and adverse drug reactions. This work links molecular data to systems biology through causal knowledge graphs, enabling the identification of mechanistic relationships and potential drug repurposing targets.

drug-target interactionDTI-Voodoodrug repurposingPharmGKBSBML Harvestercausal knowledge graph
Read more →

Genomics

I contribute to the development of genomic resources and the analysis of population-specific genomic data. This includes reference genome assemblies, pangenome graphs for the Saudi and wider Middle Eastern population, variant-calling and structural-variant pipelines, and the analysis of antimicrobial resistance from whole-genome sequencing.

Saudi pangenomereference genomepangenome graphvariant callingstructural variantwhole-genome sequencing
Read more →

Biomedical informatics

I work on biomedical informatics infrastructure that turns research-grade data into usable inputs for clinicians and researchers. This includes biomedical knowledge-base construction (PathoPhenoDB, PhenomeNET, PhenomeBrowser), text-mining of biomedical literature, integration of clinical phenotype encodings, and analytics over electronic health records.

biomedical knowledge basePathoPhenoDBPhenomeBrowsertext miningbiomedical NLPclinical informatics
Read more →

Semantic similarity

I develop and benchmark semantic similarity measures over biomedical ontologies, including measures that operate on the OWL axiomatic structure of an ontology rather than only on its lexical or taxonomic skeleton. These measures underpin phenotype-based disease gene prioritization, ontology-aware protein function transfer, and biodiversity knowledge graph search.

semantic similarityontology similarityResnikLinphenotype similarityPhenomeNET similarity
Read more →

Microbial communities

I develop methods that lift single-protein function prediction up to the level of microbial communities, combining ontology-aware deep learning with multi-scale systems approaches. Applications include desert-soil and mangrove-microbiome design, bioprospecting from Saudi-Arabian extremophile habitats, and metagenomics-driven functional characterisation of patient and environmental microbiomes.

metagenomicsmicrobial communitiesDeepGOMetasoil microbiomemarine microbiomeextremophile
Read more →

Phenotype informatics

I develop the informatics infrastructure for phenotype data across species and clinical settings: phenotype ontologies (HPO, MP, ZP, FLOPO, plant traits), cross-species phenotype crosswalks, tools that capture and standardise phenotype descriptions, and computational pipelines that link phenotype data to underlying genes, variants, and diseases.

phenotype ontologyHPOMPFLOPOphenotype standardisationtrait recognition
Read more →

Bioengineering

I contribute computational and omics analysis to collaborative bioengineering projects. Examples include analysing transcriptomic and metabolomic responses of cells cultured in biomimetic peptide scaffolds, patient-derived disease-model analysis for precision medicine, and the integration of multi-omics data with engineered biological systems.

bioengineeringbiomimetic scaffoldpatient-derived disease modelmulti-omicstissue engineeringtranscriptomics
Read more →

Funded research projects led by or involving the Bio-Ontology Research Group.

PeriodProjectRole
2025–ongoingKAUST Smart Health Initiative 2024 round (SHI2024)PI
2025–2026Personalized cancer treatment prediction (KCSH Pathway to Impact 2025)PI
2024–ongoingKAUST Center of Excellence for Generative AI (Health and Wellness, BCB theme)PI
2024–2026A public Saudi pangenome as reference for genomics in the Middle EastPI
2023–2026Towards sound, complete, and explainable machine learning with biomedical ontologies (CRG11)PI
2023–2026Disease Models from Patient-derived Leukemic Cells in Biomimetic Peptide Scaffolds for Precision Medicine ApplicationsPI (co-PI)
2023–2025Enabling desert revegetation by AI-tailored soil microbiome fortificationCoI
2023–2023Enabling mangrove restoration by AI-tailored microbiome fortificationCoI
2022–2024Metagenomics-based surface prospectingPI
2022–2025Evolutionary potential of corals to adapt to climate warmingCoI
2022–2024Computational methods for functional metagenomics: from protein functions to multi-scale interactionsPI
2021–2023IBNSINA-QI: Integrating Biomedical Networks and Semantic Information for Neural network Analysis of Quantitative InformationPI
2021–2023Development of Algorithms for Biotechnology and Biomedical ApplicationsPI
2019–2021CompleX: Variant Prioritization in Complex DiseasePI
2019–2021Improving health of Saudi populationPI
2019–2019Whole genome sequencing of rare disease patientsPI
2018–2021Sequencing and computational analysis of MRSA samplesPI
2018–2019Improvement of genetic variant prioritization technologyPI
2018–2020Bio2Vec: Smart analytics infrastructure for the life sciencesPI
2018–2020The Whale Shark 100: Applying Population Genomics to Understand Mysteries of the World's Largest FishCoI
2016–2018Data integration and ontologies for microbial cell factoriesWP-lead

Pre-KAUST projects

  • (2005–2009) Basic considerations for improving interoperability between ontology-based biological information systems
  • (2009–2010) Postdoctoral research on biomedical ontology reasoning and integration
  • (2010–2014) Phenotype ontologies and translational research (Cambridge / Aberystwyth)

Open-source tools, libraries, services, and ontologies maintained by the Bio-Ontology Research Group. Source is on github.com/bio-ontology-research-group.

Ontology Embedding & Machine Learning

Libraries and methods that turn ontologies into vector representations or otherwise combine logical structure with statistical learning.

mOWL

Python library for machine learning with biomedical ontologies. Unifies projection-, axiom- and geometric-embedding methods (EL Embeddings, ELBE, BoxSquaredEL, OWL2Vec*, DL2Vec, OPA2Vec) behind one API, with first-class OWLAPI access and PyTorch integration.

Walking RDF and OWL

Original feature-learning method over RDF graphs and OWL ontologies via biased random walks; the seed implementation for many later embedding methods including OWL2Vec*.

GitHub · 47
OPA2Vec

Combines ontology axioms with associated annotation properties (labels, synonyms, definitions) into a single corpus, then trains Word2Vec to produce semantically rich vectors for ontology classes.

GitHub · 37
EL Embeddings

Reference implementation of geometric embeddings for the EL++ description logic, the predecessor of GeometrE and BoxSquaredEL. Preserves subsumption reasoning by mapping classes to convex regions.

GitHub · 28
Onto2Vec

Representation learning for ontologies and their annotations by treating logical axioms as natural-language sentences; predecessor of OPA2Vec.

GitHub · 21
DL2Vec

Encodes description-logic axioms as a directed graph and learns embeddings via random walks; widely used for downstream gene-disease and protein-function prediction.

GitHub · 20
Interpretable Learning

Generates interpretable symbolic rules from learned representations over biomedical knowledge bases.

GitHub · 5
catE

Category-theoretic, lattice-preserving embedding of ALC description-logic ontologies that retains the consequence-closure semantics of the original theory.

GitHub · 3
DELE

Deductive EL embeddings: enrich training data with the deductive closure of an ontology before learning, so embeddings recover entailment rather than only asserted axioms.

GitHub · 2
EL2Box

Box-shaped geometric embeddings for EL++ that strengthen the topological guarantees of EL Embeddings.

GitHub · 2

Protein Function Prediction

Deep-learning and neuro-symbolic models for predicting Gene Ontology functional annotations of proteins.

DeepGOPlus

CNN-ensemble protein-function predictor that augments sequence-based scoring with k-nearest-neighbour homology and GO axioms; one of the strongest CAFA-evaluated open models.

DeepGO

Original sequence-based, ontology-aware deep classifier for predicting Gene Ontology functional annotations; basis of the entire DeepGO family of tools.

GitHub · 87
DeepGO2

Next-generation DeepGO model with transformer protein embeddings and improved hierarchical multi-label prediction.

GitHub · 57
DeepGOZero

Zero-shot extension of DeepGO using model-theoretic ELEmbeddings to predict GO classes that have never been observed during training.

GitHub · 34
DeepGOMeta

DeepGO trained specifically for metagenomic communities; predicts functional roles of proteins recovered from environmental samples and links them to biogeochemical processes.

GitHub · 8
GO-Agent

LLM-agent framework that decomposes protein-function prediction into tool-calling sub-tasks (sequence search, structure lookup, domain reasoning) and stitches the evidence into a final GO annotation.

GitHub · 7
DeepPheno

Predicts loss-of-function organism-level phenotypes (HPO/MPO) directly from a gene's annotated functions, using a hierarchical neural classifier over phenotype ontologies.

GitHub · 6
PU-GO

Positive-unlabeled ranking of protein functions with ontology-based priors; directly addresses the partial-annotation problem in CAFA benchmarks.

GitHub · 4
PhenoGoCon

Predicts gene–phenotype associations from predicted Gene Ontology functions; bridges GO function prediction and HPO/MPO phenotype prediction.

GitHub · 3
Genomic Context

Bacterial protein-function prediction that exploits operon and genome-neighbourhood structure in addition to sequence and homology.

GitHub · 3

Variant and Disease Prioritization

Tools that combine phenotype, ontology, and sequence data to rank candidate disease-causing variants and predict gene-disease associations.

PhenomeNET-VP

Phenotype-driven variant prioritization for whole-exome and whole-genome sequencing data; widely used implementation of the phenotype-aware variant ranking approach.

GitHub · 43
DeepSVP

Prioritizes structural and copy-number variants by combining patient phenotype with gene-function similarity learned from biomedical ontologies.

GitHub · 18
DeepViral

Predicts virus–host protein-protein interactions from sequence and infectious-disease phenotypes; trained jointly across coronaviruses, influenza, and other RNA viruses.

GitHub · 12
EmbedPVP

Embedding-based phenotype-aware variant predictor that ranks candidate causative variants using joint sequence- and phenotype-derived representations.

GitHub · 8
STARVar

Symptom-based tool for automatic ranking of variants using evidence from the biomedical literature and population genomes; combines text mining with phenotype matching.

GitHub · 7
predCAN

Ontology-based prediction of cancer driver genes by integrating phenotype, pathway and function knowledge with somatic-variant features.

GitHub · 5
INDIGENA

Inductive prediction of disease–gene associations from phenotype ontologies; generalises to unseen diseases via ontology-aware embeddings.

GitHub · 1
GenomeLinter

AI-powered clinical decision-support tool that ingests annotated VCFs and synthesises diagnostic interpretations for rare-disease patients without requiring deep bioinformatics expertise.

Ontology Reasoning & Tooling

Reasoning-as-a-service infrastructure, ontology repositories, and utilities for working with OWL.

vec2SPARQL

Adds embedding-similarity functions to a SPARQL endpoint so that vector-space queries (k-nearest neighbours, cosine similarity) can be mixed with classical graph patterns.

GitHub · 14
Onto2Graph

Generates entailment-aware graph projections of OWL ontologies suitable for downstream graph machine learning while preserving the axioms' deductive structure.

GitHub · 12
AberOWL

Ontology repository delivering OWL EL reasoning as a service: stores hundreds of bio-ontologies, exposes SPARQL with class-expression query expansion, and powers semantic search over PubMed/PMC.

UNMIREOT

Identifies, diagnoses, and semi-automatically repairs hidden contradictions and unsatisfiable classes introduced by partial imports (MIREOT) into biomedical ontologies.

GitHub · 2
OntoFunc

Ontology-driven enrichment analysis that supports arbitrary OWL ontologies and full subsumption-aware aggregation, not only GO.

Knowledge Graphs & Drug Discovery

Biomedical knowledge graph construction, drug repurposing, drug-drug interactions, and graph-based prediction.

Multi-Drug Embedding

Drug repurposing method that learns joint embeddings of drugs, targets and diseases from biomedical knowledge graphs and the scientific literature.

GitHub · 36
NanoDesigner

Iterative refinement framework for nanobody/CDR design that explicitly models the antigen–CDR interdependence; companion code to the NanoDesigner paper.

GitHub · 16
DeepMOCCA

Graph neural network for cancer survival analysis that integrates multi-omics (mutation, expression, methylation, CNV) with a curated cancer knowledge graph.

GitHub · 14
SmuDGE

Semantic disease-gene embeddings; integrates phenotype, function and pathway ontologies into a unified vector space for downstream prediction.

GitHub · 12
PathoPhenoDB

Curated database of pathogens and the disease phenotypes they cause, distributed as an OWL ontology and an interactive web application.

GitHub · 8

Teaching & Tutorials

Self-contained teaching material accompanying our courses and review articles.

Machine Learning with Ontologies

Companion code and worked examples for the Briefings in Bioinformatics tutorial review; the most-starred repository in the group.

GitHub · 132
Ontology Tutorial

Hands-on tutorial that walks new users through OWL, automated reasoning, and ontology-aware data analysis; basis for the AI in Biomedicine summer school.

GitHub · 73
mOWL Tutorial

Step-by-step worked notebooks that demonstrate every embedding family in mOWL on protein-function, gene-disease, and ontology-completion tasks.

GitHub · 8

Ontologies & Resources

Bio-ontologies and curated resources maintained by the group.

Units of Measurement Ontology (UO)

OBO Foundry ontology of units of measurement; aligned with QUDT and used across biomedical data standards.

GitHub · 23
PhenomeNet

Cross-species phenotype ontology and similarity network combining HPO, MPO, ZP and others; the substrate behind PhenomeNET-VP and DeepPheno.

GitHub · 1