Rare disease
The diagnosis of rare and Mendelian disease has been transformed by exome and genome sequencing, but interpretation remains the bottleneck: a typical patient genome contains tens of thousands of rare variants, only one or a few of which are causative. Effective diagnostic support requires the integration of patient-specific molecular data with structured background knowledge about genes, phenotypes, and disease mechanisms. We develop methods, anchored on the PhenomeNET phenotype network and the PVP family of variant prioritisation tools, that combine automated reasoning over phenotype ontologies with machine learning to rank candidate variants by the clinical phenotype of the patient. The Human Phenotype Ontology (HPO) is at the centre of this approach, together with model-organism phenotype resources that allow inferences to bridge species.
From phenotype networks to variant prioritisation
The PhenomeNET line of work began with PhenomeNET: a whole-phenome approach to disease gene discovery, which transformed phenotype ontologies into a formal representation enabling cross-species comparison of phenotypes, and was extended in Similarity-based search of model organism, disease and drug effect phenotypes to support real-time similarity queries over a large repository of model-organism, disease, and drug-effect phenotypes. The integration of phenotype ontologies across species was put on a sound basis in Integrating phenotype ontologies with PhenomeNET, and the role of model-organism phenotypes in human gene discovery was assessed in Contribution of model organism phenotypes to the computational identification of human disease genes. Building on this foundation, Semantic prioritization of novel causative genomic variants introduced the PVP framework, which combines pathogenicity prediction with semantic similarity between patient and disease phenotypes to rank candidate variants in whole-genome sequencing data.
Deep learning, structural variants, and oligogenic disease
Subsequent systems brought deep learning into variant prioritisation. DeepPVP: phenotype-based prioritization of causative variants using deep learning replaced hand-crafted scoring with a neural model trained jointly on variant features and phenotype similarity. DeepSVP: integration of genotype and phenotype for structural variant prioritization using deep learning extended the approach to structural and copy-number variants by learning gene-function similarity from biomedical ontologies, while OligoPVP: Phenotype-driven analysis of individual genomic information to prioritize oligogenic disease variants addressed Mendelian disorders that require two or more interacting variants. Semantic Disease Gene Embeddings (SmuDGE): phenotype-based disease gene prioritization without phenotypes showed that even when a patient is incompletely phenotyped, embedding-based representations recover diagnostic signal from related sources. Most recently, Prioritizing genomic variants through neuro-symbolic, knowledge-enhanced learning introduced EmbedPVP, which fuses sequence-derived and ontology-derived representations in a neuro-symbolic framework, and The application of Large Language Models to the phenotype-based prioritization of causative genes in rare disease patients evaluated how LLMs can extend phenotype-based gene prioritisation beyond curated genotype-to-phenotype databases.
Predicting the phenotypic consequences of a genetic variant requires gene-level phenotype models. DeepPheno: Predicting single gene loss-of-function phenotypes using an ontology-aware hierarchical classifier predicts HPO/MPO phenotypes from gene functions, complementing Predicting candidate genes from phenotypes, functions and anatomical site of expression. Tools such as Starvar: symptom-based tool for automatic ranking of variants using evidence from literature and genomes add literature-derived evidence and patient symptoms to the ranking. Curated knowledge bases support the broader programme: DDIEM: drug database for inborn errors of metabolism catalogues treatment strategies for inborn errors of metabolism, and PathoPhenoDB: linking human pathogens to their disease phenotypes in support of infectious disease research links pathogens to clinical phenotypes for infectious-disease applications.
External assessments, including CAGI6 ID panel challenge: assessment of phenotype and variant predictions in 415 children with neurodevelopmental disorders (NDDs) and Critical assessment of variant prioritization methods for rare disease diagnosis within the rare genomes project, have benchmarked these tools on independent cohorts. The work feeds active programmes including a public Saudi pangenome for Middle-Eastern rare-disease interpretation, the KAUST Center of Excellence for Generative AI in Health and Wellness, and clinical decision-support tools such as GenomeLinter that aim to put phenotype-aware genome interpretation in the hands of non-specialist clinicians.
Projects
- Personalized cancer treatment prediction (KCSH Pathway to Impact 2025) (2025–2026)
- KAUST Center of Excellence for Generative AI (Health and Wellness, BCB theme) (2024–ongoing)
- Disease Models from Patient-derived Leukemic Cells in Biomimetic Peptide Scaffolds for Precision Medicine Applications (2023–2026)
- A public Saudi pangenome as reference for genomics in the Middle East (2024–2026)
- IBNSINA-QI: Integrating Biomedical Networks and Semantic Information for Neural network Analysis of Quantitative Information (2021–2023)
- CompleX: Variant Prioritization in Complex Disease (2019–2021)
- Improving health of Saudi population (2019–2021)
- Whole genome sequencing of rare disease patients (2019–2019)
- Improvement of genetic variant prioritization technology (2018–2019)
Software
- PhenomeNET-VP — Phenotype-driven variant prioritization for whole-exome and whole-genome sequencing data; widely used implementation of the phenotype-aware variant ranking approach.
- DeepSVP — Prioritizes structural and copy-number variants by combining patient phenotype with gene-function similarity learned from biomedical ontologies.
- EmbedPVP — Embedding-based phenotype-aware variant predictor that ranks candidate causative variants using joint sequence- and phenotype-derived representations.
- STARVar — Symptom-based tool for automatic ranking of variants using evidence from the biomedical literature and population genomes; combines text mining with phenotype matching.
- INDIGENA — Inductive prediction of disease–gene associations from phenotype ontologies; generalises to unseen diseases via ontology-aware embeddings.
- GenomeLinter — AI-powered clinical decision-support tool that ingests annotated VCFs and synthesises diagnostic interpretations for rare-disease patients without requiring deep bioinformatics expertise.
- predCAN — Ontology-based prediction of cancer driver genes by integrating phenotype, pathway and function knowledge with somatic-variant features.
- DeepViral — Predicts virus–host protein-protein interactions from sequence and infectious-disease phenotypes; trained jointly across coronaviruses, influenza, and other RNA viruses.
Publications (32)
- (2025) Aspromonte, Del Conte, Zhu, Tan et al.. CAGI6 ID panel challenge: assessment of phenotype and variant predictions in 415 children with neurodevelopmental disorders (NDDs) Human Genetics.
- (2025) Kafkas, Abdelhakim, Althagafi, Toonsi et al.. The application of Large Language Models to the phenotype-based prioritization of causative genes in rare disease patients Scientific Reports.
- (2025) Maktabi, Liu, Almesfer, Abdelhakim et al.. Genomic landscape of retinoblastoma: Insights into risk stratification and precision pediatric Neuro-Oncology Neuro-Oncology Pediatrics.
- (2024) Althagafi, Zhapa-Camacho, Hoehndorf. Prioritizing genomic variants through neuro-symbolic, knowledge-enhanced learning Bioinformatics.
- (2024) Stenton, O’Leary, Lemire, VanNoy et al.. Critical assessment of variant prioritization methods for rare disease diagnosis within the rare genomes project Human Genomics.
- (2024) Toonsi, Gauran, Ombao, Schofield et al.. Causal relationships between diseases mined from the literature improve the use of polygenic risk scores Bioinformatics.
- (2023) Senay Kafkas, Marwa Abdelhakim, Mahmut Uludag, Azza Althagafi et al.. Starvar: symptom-based tool for automatic ranking of variants using evidence from literature and genomes BMC Bioinformatics.
- (2022) Sarah Alghamdi, Paul N. Schofield, Robert Hoehndorf. Contribution of model organism phenotypes to the computational identification of human disease genes Disease Models & Mechanisms.
- (2022) Azza Althagafi, Lamia Alsubaie, Nagarajan Kathiresan, Katsuhiko Mineta et al.. DeepSVP: integration of genotype and phenotype for structural variant prioritization using deep learning Bioinformatics.
- (2021) Jun Chen, Azza Althagafi, Robert Hoehndorf. Predicting candidate genes from phenotypes, functions and anatomical site of expression Bioinformatics.
- (2020) Kulmanov, Hoehndorf. DeepPheno: Predicting single gene loss-of-function phenotypes using an ontology-aware hierarchical classifier PLOS Computational Biology.
- (2020) Marwa Abdelhakim, Eunice McMurray, Ali Raza Syed, Senay Kafkas et al.. DDIEM: drug database for inborn errors of metabolism Orphanet Journal of Rare Diseases.
- (2020) Ahmed Alfares, Lamia Alsubaie, Taghrid Aloraini, Aljoharah Alaskar et al.. What is the right sequencing approach? Solo VS extended family analysis in consanguineous populations BMC Medical Genomics.
- (2020) Muhammad Umair, Mariam Ballow, Abdulaziz Asiri, Yusra Alyafee et al.. EMC10 homozygous variant identified in a family with global developmental delay, mild intellectual disability, and speech delay Clinical Genetics.
- (2019) Kafkas, Abdelhakim, Hashish, Kulmanov et al.. PathoPhenoDB: linking human pathogens to their disease phenotypes in support of infectious disease research Scientific Data.
- (2019) Kafkas, Hoehndorf. Ontology based text mining of gene-phenotype associations: application to candidate gene prediction Database.
- (2019) Boudellioua, Kulmanov, Schofield, Gkoutos et al.. DeepPVP: phenotype-based prioritization of causative variants using deep learning BMC Bioinformatics.
- (2018) Boudellioua, Kulmanov, Schofield, Gkoutos et al.. OligoPVP: Phenotype-driven analysis of individual genomic information to prioritize oligogenic disease variants Scientific Reports.
- (2018) Alshahrani, Hoehndorf. Semantic Disease Gene Embeddings (SmuDGE): phenotype-based disease gene prioritization without phenotypes Bioinformatics.
- (2017) Boudellioua, Mahamad Razali, Kulmanov, Hashish et al.. Semantic prioritization of novel causative genomic variants PLOS Computational Biology.
- … and 12 more.