Biomedical informatics

Our biomedical informatics work converts heterogeneous research-grade data into usable inputs for clinicians and computational biologists. Within KAUST's Computer Science Program, we build biomedical knowledge bases, mine text for structured biological assertions, standardize clinical phenotype encodings, and develop analytics over electronic health records and rare-disease cohorts. The distinctive feature of our approach is that almost every component is grounded in formal ontologies, so that text-mined facts, curated databases and clinical observations share a common semantic substrate and can be compared, reasoned over and embedded jointly.

Knowledge bases for infectious and rare disease

Several of our long-running contributions are integrated knowledge bases. PathoPhenoDB: linking human pathogens to their disease phenotypes in support of infectious disease research is a curated and text-mined resource that connects pathogens to the disease phenotypes they cause, distributed both as an OWL ontology and as an interactive web application, and accompanied by the methodological papers Ontology based mining of pathogen--disease associations from literature. The infrastructure work Aber-OWL: a framework for ontology-based data access in biology provides the reasoning backend that supports queries over these resources, while The role of ontologies in biological and biomedical research: a functional perspective and Datamining with Ontologies articulate the broader rationale and methodology for ontology-grounded data integration.

Text mining and clinical NLP

To populate and extend these knowledge bases, we develop biomedical text-mining methods that operate at the ontology level. Ontology based text mining of gene-phenotype associations: application to candidate gene prediction demonstrated that literature-mined gene-phenotype associations meaningfully complement curated databases for gene prioritization. Ontology-Based Concept Recognition by Using Word Embeddings and Combining lexical and context features for automatic ontology extension show how distributional semantics can be used to recognize concepts and extend ontologies semi-automatically. On the clinical side, Improved characterisation of clinical text through ontology-based vocabulary expansion and Effects of Negation and Uncertainty Stratification on Text-Derived Patient Profile Similarity address two specific obstacles in turning narrative clinical text into ontology-coded phenotype profiles suitable for similarity-based diagnosis. Multi-faceted semantic clustering with text-derived phenotypes extends this into patient stratification.

Clinical decision support and EHR analytics

These components are combined into decision-support tools that operate on patient data. Starvar: symptom-based tool for automatic ranking of variants using evidence from literature and genomes ranks variants by combining text-mined evidence with patient symptoms. The application of Large Language Models to the phenotype-based prioritization of causative genes in rare disease patients revisits gene prioritization with modern LLMs, comparing them against established semantic-similarity baselines on real rare-disease cohorts. Predicting candidate genes from phenotypes, functions and anatomical site of expression adds anatomical context to the prediction problem, and Ontology-based prediction of cancer driver genes shows that the same ontology-aware embedding strategy generalizes to oncology. Causal relationships between diseases mined from the literature improve the use of polygenic risk scores demonstrates how literature-derived causal graphs sharpen polygenic risk modeling for downstream EHR studies. Foundational analyses such as Evaluation of research in biomedical ontologies, Ranking Adverse Drug Reactions With Crowdsourcing, Usage of cell nomenclature in biomedical literature, and the BioHackathon reports anchor this work in community standards and shared benchmarks.

These tools and resources are used in active programs on rare-disease diagnostic support, infectious-disease surveillance, drug repurposing and cancer prognostics, and they support clinical collaborators through software such as PathoPhenoDB, SmuDGE, Multi-Drug Embedding and DeepMOCCA. We are now extending this stack toward operational decision-support for genetic-medicine clinics, with explicit attention to populations and disease patterns that are common in the Middle East but under-represented in international resources.

Software

Publications (47)

Show 27 more
  • (2019) Kafkas, Hoehndorf. Ontology based text mining of gene-phenotype associations: application to candidate gene prediction Database.
  • (2019) Katayama, Kawashima, Micklem, Kawano et al.. BioHackathon series in 2013 and 2014: improvements of semantic interoperability in life science data and services F1000Research.
  • (2019) Timothy K. Cooper, Kathleen A. Silva, Victoria E. Kennedy, Sarah M. Alghamdi et al.. Hyaline Arteriolosclerosis in 30 Strains of Aged Inbred Mice Veterinary Pathology.
  • (2018) Sohaib Younis, Claus Weiland, Robert Hoehndorf, Stefan Dressler et al.. Taxon and trait recognition from digitized herbarium specimens using deep convolutional neural networks Botany Letters.
  • (2018) Senay Kafkas, Robert Hoehndorf. Ontology based mining of pathogen-disease associations from literature Bio-Ontologies COSI.
  • (2018) Sara Althubaiti, Senay Kafkas, Robert Hoehndorf. Ontology-Based Concept Recognition by Using Word Embeddings Bio-Ontologies COSI.
  • (2017) Kafkas, Sarntivijai, Hoehndorf. Usage of cell nomenclature in biomedical literature BMC Bioinformatics.
  • (2016) Boudellioua, Saidi, Hoehndorf, Martin et al.. Prediction of Metabolic Pathway Involvement in Prokaryotic UniProtKB Data by Association Rule Mining PLoS ONE.
  • (2016) Hoehndorf, Gkoutos, Schofield. Datamining with Ontologies Data Mining Techniques for the Life Sciences.
  • (2015) Robert Hoehndorf, Luke Slater, Paul N Schofield, Georgios V Gkoutos. Aber-OWL: a framework for ontology-based data access in biology BMC Bioinformatics.
  • (2015) Martin Hrab\ve de Angelis, George Nicholson, Mohammed Selloum, Jacqueline K White et al.. Analysis of mammalian gene function through broad-based phenotypic screens across a consortium of mouse clinics Nature Genetics.
  • (2015) Gottlieb, Hoehndorf, Dumontier, Altman. Ranking Adverse Drug Reactions With Crowdsourcing J Med Internet Res.
  • (2015) Hoehndorf, Schofield, Gkoutos. The role of ontologies in biological and biomedical research: a functional perspective Briefings in Bioinformatics.
  • (2014) Hoehndorf, Hancock, Hardy, Mallon et al.. Analyzing gene expression data in mice with the Neuro Behavior Ontology Mamm Genome.
  • (2014) Rutger Vos, Jordan Biserkov, Bachir Balech, Niall Beard et al.. Enriched biodiversity data as a resource and service Biodiversity Data Journal.
  • (2013) Hoehndorf, Hardy, Osumi-Sutherland, Tweedie et al.. Systematic Analysis of Experimental Phenotype Data Reveals Gene Functions PLoS ONE.
  • (2013) Rebholz-Schuhmann, Kafkas, Kim, Li et al.. Evaluating gold standard corpora against gene/protein tagging solutions and lexical resources Journal of Biomedical Semantics.
  • (2013) Dietrich Rebholz-Schuhmann, Jee-Hyub Kim, Ying Yan, Abhishek Dixit et al.. Evaluation and Cross-Comparison of Lexical Entities of Biological Interest (LexEBI) PLoS ONE.
  • (2012) Hoehndorf, Dumontier, Gkoutos. Identifying aberrant pathways through integrated analysis of knowledge in pharmacogenomics Bioinformatics.
  • (2012) Hoehndorf, Dumontier, Gkoutos. Evaluation of research in biomedical ontologies Briefings in Bioinformatics.
  • (2012) Dietrich Rebholz-Schuhmann, Anika Oellrich, Robert Hoehndorf. Text-mining solutions for biomedical research: enabling integrative biology Nature Reviews Genetics.
  • (2012) Robert Hoehndorf, Georgios V. Gkoutos. A translational medicine approach to orphan diseases Proceedings of the Virtual Physiological Human Conference 2012 (VPH2012).
  • Robert Hoehndorf, Colin Batchelor, Thomas Bittner, Michel Dumontier et al.. The RNA Ontology (RNAO): An Ontology for Integrating RNA Sequence and Structure Data Applied Ontology.
  • de Bono, Hoehndorf, Wimalaratne, Gkoutos et al.. The RICORDO approach to semantic interoperability for biomedical data and models: strategy, standards and solutions. BMC Research Notes.
  • Hoehndorf, Ngonga Ngomo, Pyysalo, Ohta et al.. Ontology design patterns to disambiguate relations between genes and gene products in GENIA Journal of Biomedical Semantics.
  • Herre, Hoehndorf, Kelso, Loebe et al.. OBML - Ontologies in Biomedicine and Life Sciences Journal of Biomedical Semantics.
  • Wimalaratne, Grenon, Hoehndorf, Gkoutos et al.. An infrastructure for ontology-based information systems in biomedicine: RICORDO case study Bioinformatics.