Biomedical informatics
Our biomedical informatics work converts heterogeneous research-grade data into usable inputs for clinicians and computational biologists. Within KAUST's Computer Science Program, we build biomedical knowledge bases, mine text for structured biological assertions, standardize clinical phenotype encodings, and develop analytics over electronic health records and rare-disease cohorts. The distinctive feature of our approach is that almost every component is grounded in formal ontologies, so that text-mined facts, curated databases and clinical observations share a common semantic substrate and can be compared, reasoned over and embedded jointly.
Knowledge bases for infectious and rare disease
Several of our long-running contributions are integrated knowledge bases. PathoPhenoDB: linking human pathogens to their disease phenotypes in support of infectious disease research is a curated and text-mined resource that connects pathogens to the disease phenotypes they cause, distributed both as an OWL ontology and as an interactive web application, and accompanied by the methodological papers Ontology based mining of pathogen--disease associations from literature. The infrastructure work Aber-OWL: a framework for ontology-based data access in biology provides the reasoning backend that supports queries over these resources, while The role of ontologies in biological and biomedical research: a functional perspective and Datamining with Ontologies articulate the broader rationale and methodology for ontology-grounded data integration.
Text mining and clinical NLP
To populate and extend these knowledge bases, we develop biomedical text-mining methods that operate at the ontology level. Ontology based text mining of gene-phenotype associations: application to candidate gene prediction demonstrated that literature-mined gene-phenotype associations meaningfully complement curated databases for gene prioritization. Ontology-Based Concept Recognition by Using Word Embeddings and Combining lexical and context features for automatic ontology extension show how distributional semantics can be used to recognize concepts and extend ontologies semi-automatically. On the clinical side, Improved characterisation of clinical text through ontology-based vocabulary expansion and Effects of Negation and Uncertainty Stratification on Text-Derived Patient Profile Similarity address two specific obstacles in turning narrative clinical text into ontology-coded phenotype profiles suitable for similarity-based diagnosis. Multi-faceted semantic clustering with text-derived phenotypes extends this into patient stratification.
Clinical decision support and EHR analytics
These components are combined into decision-support tools that operate on patient data. Starvar: symptom-based tool for automatic ranking of variants using evidence from literature and genomes ranks variants by combining text-mined evidence with patient symptoms. The application of Large Language Models to the phenotype-based prioritization of causative genes in rare disease patients revisits gene prioritization with modern LLMs, comparing them against established semantic-similarity baselines on real rare-disease cohorts. Predicting candidate genes from phenotypes, functions and anatomical site of expression adds anatomical context to the prediction problem, and Ontology-based prediction of cancer driver genes shows that the same ontology-aware embedding strategy generalizes to oncology. Causal relationships between diseases mined from the literature improve the use of polygenic risk scores demonstrates how literature-derived causal graphs sharpen polygenic risk modeling for downstream EHR studies. Foundational analyses such as Evaluation of research in biomedical ontologies, Ranking Adverse Drug Reactions With Crowdsourcing, Usage of cell nomenclature in biomedical literature, and the BioHackathon reports anchor this work in community standards and shared benchmarks.
These tools and resources are used in active programs on rare-disease diagnostic support, infectious-disease surveillance, drug repurposing and cancer prognostics, and they support clinical collaborators through software such as PathoPhenoDB, SmuDGE, Multi-Drug Embedding and DeepMOCCA. We are now extending this stack toward operational decision-support for genetic-medicine clinics, with explicit attention to populations and disease patterns that are common in the Middle East but under-represented in international resources.
Software
- SmuDGE — Semantic disease-gene embeddings; integrates phenotype, function and pathway ontologies into a unified vector space for downstream prediction.
- Multi-Drug Embedding — Drug repurposing method that learns joint embeddings of drugs, targets and diseases from biomedical knowledge graphs and the scientific literature.
- DeepMOCCA — Graph neural network for cancer survival analysis that integrates multi-omics (mutation, expression, methylation, CNV) with a curated cancer knowledge graph.
- PathoPhenoDB — Curated database of pathogens and the disease phenotypes they cause, distributed as an OWL ontology and an interactive web application.
- NanoDesigner — Iterative refinement framework for nanobody/CDR design that explicitly models the antigen–CDR interdependence; companion code to the NanoDesigner paper.
Publications (47)
- (2026) Guzman-Vega, Cardona-Londono, Gonzalez-Alvarez, Pena-Guerra et al.. VarLand: A pipeline to map the structural landscape of missense variants at the proteome scale Journal of Biological Chemistry.
- (2026) Zhapa-Camacho, Hoehndorf. INDIGENA: inductive prediction of disease–gene associations using phenotype ontologies Bioinformatics.
- (2025) Kafkas, Abdelhakim, Althagafi, Toonsi et al.. The application of Large Language Models to the phenotype-based prioritization of causative genes in rare disease patients Scientific Reports.
- (2025) Alarawi, Altammami, Abutarboush, Kulmanov et al.. Genomic diversity and antimicrobial resistance of Staphylococcus aureus in Saudi Arabia: a nationwide study using whole-genome sequencing Microbial Genomics.
- (2025) Gomez, Al Mahri, Abdullah, Malik et al.. Age-related differences in gene expression and pathway activation following heatstroke Physiological Genomics.
- (2025) Alhattab, Barakeh, Khoja, Elhadi et al.. Sa1216: Development of colorectal cancer and matched healthy organoids from Saudi patients: a case study Gastroenterology.
- (2025) Schofield, Hoehndorf, Gkoutos, Smith. The informatics of developmental phenotypes Kaufman’s Atlas of Mouse Development Supplement.
- (2024) Toonsi, Gauran, Ombao, Schofield et al.. Causal relationships between diseases mined from the literature improve the use of polygenic risk scores Bioinformatics.
- (2023) Senay Kafkas, Marwa Abdelhakim, Mahmut Uludag, Azza Althagafi et al.. Starvar: symptom-based tool for automatic ranking of variants using evidence from literature and genomes BMC Bioinformatics.
- (2023) Mazen Hassanain, Yang Liu, Weam Hussain, Albandri Binowayn et al.. Genomic landscape in Saudi patients with hepatocellular carcinoma using whole-genome sequencing: a pilot study Frontiers in Gastroenterology.
- (2021) Luke T. Slater, William Bradlow, Simon Ball, Robert Hoehndorf et al.. Improved characterisation of clinical text through ontology-based vocabulary expansion Journal of Biomedical Semantics.
- (2021) Liu-Wei, Kafkas, Chen, Dimonaco et al.. DeepViral: prediction of novel virus–host interactions from protein sequences and infectious disease phenotypes Bioinformatics.
- (2021) Jun Chen, Azza Althagafi, Robert Hoehndorf. Predicting candidate genes from phenotypes, functions and anatomical site of expression Bioinformatics.
- (2021) Luke T. Slater, John A. Williams, Andreas Karwath, Hilary Fanning et al.. Multi-faceted semantic clustering with text-derived phenotypes Computers in Biology and Medicine.
- (2021) Luke T. Slater, Andreas Karwath, Robert Hoehndorf, Georgios V. Gkoutos. Effects of Negation and Uncertainty Stratification on Text-Derived Patient Profile Similarity Frontiers in Digital Health.
- (2020) Sara Althubaiti, Senay Kafkas, Marwa Abdelhakim, Robert Hoehndorf. Combining lexical and context features for automatic ontology extension Journal of Biomedical Semantics.
- (2020) Rutger A. Vos, Toshiaki Katayama, Hiroyuki Mishima, Shin Kawano et al.. BioHackathon 2015: Semantics of data for life sciences and reproducible research F1000Research.
- (2019) Althubaiti, Karwath, Dallol, Noor et al.. Ontology-based prediction of cancer driver genes Scientific Reports.
- (2019) Senay Kafkas, Robert Hoehndorf. Ontology based mining of pathogen--disease associations from literature Journal of Biomedical Semantics.
- (2019) Kafkas, Abdelhakim, Hashish, Kulmanov et al.. PathoPhenoDB: linking human pathogens to their disease phenotypes in support of infectious disease research Scientific Data.
Show 27 more
- (2019) Kafkas, Hoehndorf. Ontology based text mining of gene-phenotype associations: application to candidate gene prediction Database.
- (2019) Katayama, Kawashima, Micklem, Kawano et al.. BioHackathon series in 2013 and 2014: improvements of semantic interoperability in life science data and services F1000Research.
- (2019) Timothy K. Cooper, Kathleen A. Silva, Victoria E. Kennedy, Sarah M. Alghamdi et al.. Hyaline Arteriolosclerosis in 30 Strains of Aged Inbred Mice Veterinary Pathology.
- (2018) Sohaib Younis, Claus Weiland, Robert Hoehndorf, Stefan Dressler et al.. Taxon and trait recognition from digitized herbarium specimens using deep convolutional neural networks Botany Letters.
- (2018) Senay Kafkas, Robert Hoehndorf. Ontology based mining of pathogen-disease associations from literature Bio-Ontologies COSI.
- (2018) Sara Althubaiti, Senay Kafkas, Robert Hoehndorf. Ontology-Based Concept Recognition by Using Word Embeddings Bio-Ontologies COSI.
- (2017) Kafkas, Sarntivijai, Hoehndorf. Usage of cell nomenclature in biomedical literature BMC Bioinformatics.
- (2016) Boudellioua, Saidi, Hoehndorf, Martin et al.. Prediction of Metabolic Pathway Involvement in Prokaryotic UniProtKB Data by Association Rule Mining PLoS ONE.
- (2016) Hoehndorf, Gkoutos, Schofield. Datamining with Ontologies Data Mining Techniques for the Life Sciences.
- (2015) Robert Hoehndorf, Luke Slater, Paul N Schofield, Georgios V Gkoutos. Aber-OWL: a framework for ontology-based data access in biology BMC Bioinformatics.
- (2015) Martin Hrab\ve de Angelis, George Nicholson, Mohammed Selloum, Jacqueline K White et al.. Analysis of mammalian gene function through broad-based phenotypic screens across a consortium of mouse clinics Nature Genetics.
- (2015) Gottlieb, Hoehndorf, Dumontier, Altman. Ranking Adverse Drug Reactions With Crowdsourcing J Med Internet Res.
- (2015) Hoehndorf, Schofield, Gkoutos. The role of ontologies in biological and biomedical research: a functional perspective Briefings in Bioinformatics.
- (2014) Hoehndorf, Hancock, Hardy, Mallon et al.. Analyzing gene expression data in mice with the Neuro Behavior Ontology Mamm Genome.
- (2014) Rutger Vos, Jordan Biserkov, Bachir Balech, Niall Beard et al.. Enriched biodiversity data as a resource and service Biodiversity Data Journal.
- (2013) Hoehndorf, Hardy, Osumi-Sutherland, Tweedie et al.. Systematic Analysis of Experimental Phenotype Data Reveals Gene Functions PLoS ONE.
- (2013) Rebholz-Schuhmann, Kafkas, Kim, Li et al.. Evaluating gold standard corpora against gene/protein tagging solutions and lexical resources Journal of Biomedical Semantics.
- (2013) Dietrich Rebholz-Schuhmann, Jee-Hyub Kim, Ying Yan, Abhishek Dixit et al.. Evaluation and Cross-Comparison of Lexical Entities of Biological Interest (LexEBI) PLoS ONE.
- (2012) Hoehndorf, Dumontier, Gkoutos. Identifying aberrant pathways through integrated analysis of knowledge in pharmacogenomics Bioinformatics.
- (2012) Hoehndorf, Dumontier, Gkoutos. Evaluation of research in biomedical ontologies Briefings in Bioinformatics.
- (2012) Dietrich Rebholz-Schuhmann, Anika Oellrich, Robert Hoehndorf. Text-mining solutions for biomedical research: enabling integrative biology Nature Reviews Genetics.
- (2012) Robert Hoehndorf, Georgios V. Gkoutos. A translational medicine approach to orphan diseases Proceedings of the Virtual Physiological Human Conference 2012 (VPH2012).
- Robert Hoehndorf, Colin Batchelor, Thomas Bittner, Michel Dumontier et al.. The RNA Ontology (RNAO): An Ontology for Integrating RNA Sequence and Structure Data Applied Ontology.
- de Bono, Hoehndorf, Wimalaratne, Gkoutos et al.. The RICORDO approach to semantic interoperability for biomedical data and models: strategy, standards and solutions. BMC Research Notes.
- Hoehndorf, Ngonga Ngomo, Pyysalo, Ohta et al.. Ontology design patterns to disambiguate relations between genes and gene products in GENIA Journal of Biomedical Semantics.
- Herre, Hoehndorf, Kelso, Loebe et al.. OBML - Ontologies in Biomedicine and Life Sciences Journal of Biomedical Semantics.
- Wimalaratne, Grenon, Hoehndorf, Gkoutos et al.. An infrastructure for ontology-based information systems in biomedicine: RICORDO case study Bioinformatics.