Genomics
Our genomics work develops resources and computational methods for population-scale genome analysis, with a particular focus on populations that have historically been under-represented in reference databases. Within the BORG group at KAUST's Computer Science Program, we build reference genome assemblies, pangenome graphs for the Saudi and wider Middle Eastern population, variant-calling and structural-variant pipelines, and analyses of antimicrobial resistance from whole-genome sequencing. Our distinctive angle is the tight integration of formal phenotype and function knowledge with sequence-based analysis, so that genomic variation is interpreted in the context of biological background knowledge rather than purely statistically.
Reference resources for under-represented populations
Choice of reference is central to every downstream analysis, and standard references encode only a small fraction of human haplotype diversity. We released A reference quality, fully annotated diploid genome from a Saudi individual as a first step toward a regional reference, and extended this into Phased genome assemblies and pangenome graphs of human populations of Japan and Saudi Arabia, which combines haplotypes from multiple individuals into a graph-based reference (JaSaPaGe) to support variant interpretation in two populations that have been largely absent from existing pangenomes. These resources are foundational for the Smart Health Saudi pangenome programme that uses the new reference to revisit rare-disease variant interpretation across the region.
Phenotype-aware variant prioritization
A long-running thread combines whole-genome and whole-exome sequencing with phenotype ontologies to rank candidate disease variants. Phenotype-driven discovery of digenic variants in personal genome sequences and the follow-up OligoPVP: Phenotype-driven analysis of individual genomic information to prioritize oligogenic disease variants address the difficult case of diseases caused by combinations of variants in more than one gene. DeepSVP: integration of genotype and phenotype for structural variant prioritization using deep learning extends the same principle to structural and copy-number variants by combining patient phenotypes with gene-function similarity learned from biomedical ontologies. The Critical assessment of variant prioritization methods for rare disease diagnosis within the rare genomes project benchmark, run on patients from the Rare Genomes Project, found that phenotype-aware models that consider non-coding variation and phenotype expansion recover diagnoses that purely sequence-based models miss. Our most recent contribution, VarLand: A pipeline to map the structural landscape of missense variants at the proteome scale, links missense variants to AlphaFold-derived protein structures so that structural context can be used alongside sequence- and phenotype-based evidence.
Sequencing strategy itself remains a research question, and What is the right sequencing approach? Solo VS extended family analysis in consanguineous populations quantifies how much additional family members narrow the candidate-variant search in consanguineous settings, with direct implications for diagnostic protocols in Saudi Arabia.
Microbial and applied genomics
On the microbial side, Genomic diversity and antimicrobial resistance of Staphylococcus aureus in Saudi Arabia: a nationwide study using whole-genome sequencing reports a nationwide MRSA surveillance effort that links genomic lineages and resistance determinants to the context of mass-gathering events. Disease-focused projects include the Genomic landscape of retinoblastoma: Insights into risk stratification and precision pediatric Neuro-Oncology study, which dissects somatic and germline alterations beyond RB1, and Genomic landscape in Saudi patients with hepatocellular carcinoma using whole-genome sequencing: a pilot study, which characterises sorafenib-related mutation patterns. Transcriptomic work such as Whole genome transcriptomic profiling reveals distinct sex-specific responses to heat stroke extends our genomics activity into climate-related health questions of regional relevance.
These resources and methods feed directly into the Saudi pangenome programme, AI-supported variant interpretation tools such as PhenomeNET-VP, DeepSVP, EmbedPVP, STARVar, INDIGENA and GenomeLinter, and applied projects on coral adaptation in the Red Sea, mangrove restoration and whale-shark population genomics, all of which use the same population-genomic and ontology-aware analytical stack.
Projects
- A public Saudi pangenome as reference for genomics in the Middle East (2024–2026)
- Enabling mangrove restoration by AI-tailored microbiome fortification (2023–2023)
- Evolutionary potential of corals to adapt to climate warming (2022–2025)
- Sequencing and computational analysis of MRSA samples (2018–2021)
- The Whale Shark 100: Applying Population Genomics to Understand Mysteries of the World's Largest Fish (2018–2020)
Software
- PhenomeNET-VP — Phenotype-driven variant prioritization for whole-exome and whole-genome sequencing data; widely used implementation of the phenotype-aware variant ranking approach.
- DeepSVP — Prioritizes structural and copy-number variants by combining patient phenotype with gene-function similarity learned from biomedical ontologies.
- EmbedPVP — Embedding-based phenotype-aware variant predictor that ranks candidate causative variants using joint sequence- and phenotype-derived representations.
- STARVar — Symptom-based tool for automatic ranking of variants using evidence from the biomedical literature and population genomes; combines text mining with phenotype matching.
- INDIGENA — Inductive prediction of disease–gene associations from phenotype ontologies; generalises to unseen diseases via ontology-aware embeddings.
- GenomeLinter — AI-powered clinical decision-support tool that ingests annotated VCFs and synthesises diagnostic interpretations for rare-disease patients without requiring deep bioinformatics expertise.
- predCAN — Ontology-based prediction of cancer driver genes by integrating phenotype, pathway and function knowledge with somatic-variant features.
- DeepViral — Predicts virus–host protein-protein interactions from sequence and infectious-disease phenotypes; trained jointly across coronaviruses, influenza, and other RNA viruses.
Publications (16)
- (2026) Guzman-Vega, Cardona-Londono, Gonzalez-Alvarez, Pena-Guerra et al.. VarLand: A pipeline to map the structural landscape of missense variants at the proteome scale Journal of Biological Chemistry.
- (2025) Alarawi, Altammami, Abutarboush, Kulmanov et al.. Genomic diversity and antimicrobial resistance of Staphylococcus aureus in Saudi Arabia: a nationwide study using whole-genome sequencing Microbial Genomics.
- (2025) Bouchama, Gomez, Abdullah, Al Mahri et al.. Whole genome transcriptomic profiling reveals distinct sex-specific responses to heat stroke Journal of Applied Physiology.
- (2025) Kulmanov, Ashouri, Liu, Abdelhakim et al.. Phased genome assemblies and pangenome graphs of human populations of Japan and Saudi Arabia Scientific Data.
- (2025) Maktabi, Liu, Almesfer, Abdelhakim et al.. Genomic landscape of retinoblastoma: Insights into risk stratification and precision pediatric Neuro-Oncology Neuro-Oncology Pediatrics.
- (2025) Tawfiq, Niu, Kulmanov, Hoehndorf. Annotating genomes with DeepGO protein function prediction tools Protein Function Prediction.
- (2024) Kulmanov, Tawfiq, Liu, Al Ali et al.. A reference quality, fully annotated diploid genome from a Saudi individual Scientific Data.
- (2024) Stenton, O’Leary, Lemire, VanNoy et al.. Critical assessment of variant prioritization methods for rare disease diagnosis within the rare genomes project Human Genomics.
- (2023) Sherin Abdelrahman, Rui Ge, Hepi H. Susapto, Yang Liu et al.. The Impact of Mechanical Cues on the Metabolomic and Transcriptomic Profiles of Human Dermal Fibroblasts Cultured in Ultrashort Self-Assembling Peptide 3D Scaffolds ACS Nano.
- (2023) Mazen Hassanain, Yang Liu, Weam Hussain, Albandri Binowayn et al.. Genomic landscape in Saudi patients with hepatocellular carcinoma using whole-genome sequencing: a pilot study Frontiers in Gastroenterology.
- (2022) Azza Althagafi, Lamia Alsubaie, Nagarajan Kathiresan, Katsuhiko Mineta et al.. DeepSVP: integration of genotype and phenotype for structural variant prioritization using deep learning Bioinformatics.
- (2020) Ahmed Alfares, Lamia Alsubaie, Taghrid Aloraini, Aljoharah Alaskar et al.. What is the right sequencing approach? Solo VS extended family analysis in consanguineous populations BMC Medical Genomics.
- (2020) Muhammad Umair, Mariam Ballow, Abdulaziz Asiri, Yusra Alyafee et al.. EMC10 homozygous variant identified in a family with global developmental delay, mild intellectual disability, and speech delay Clinical Genetics.
- (2018) Boudellioua, Kulmanov, Schofield, Gkoutos et al.. OligoPVP: Phenotype-driven analysis of individual genomic information to prioritize oligogenic disease variants Scientific Reports.
- (2017) Imane Boudellioua, Maxat Kulmanov, Paul N Schofield, Georgios V Gkoutos et al.. Phenotype-driven discovery of digenic variants in personal genome sequences Proceedings of VarI-SIG.
- (2015) Baran, Durgahee, Eilbeck, Antezana et al.. GFVO: the Genomic Feature and Variation Ontology PeerJ.