Genomics

Our genomics work develops resources and computational methods for population-scale genome analysis, with a particular focus on populations that have historically been under-represented in reference databases. Within the BORG group at KAUST's Computer Science Program, we build reference genome assemblies, pangenome graphs for the Saudi and wider Middle Eastern population, variant-calling and structural-variant pipelines, and analyses of antimicrobial resistance from whole-genome sequencing. Our distinctive angle is the tight integration of formal phenotype and function knowledge with sequence-based analysis, so that genomic variation is interpreted in the context of biological background knowledge rather than purely statistically.

Reference resources for under-represented populations

Choice of reference is central to every downstream analysis, and standard references encode only a small fraction of human haplotype diversity. We released A reference quality, fully annotated diploid genome from a Saudi individual as a first step toward a regional reference, and extended this into Phased genome assemblies and pangenome graphs of human populations of Japan and Saudi Arabia, which combines haplotypes from multiple individuals into a graph-based reference (JaSaPaGe) to support variant interpretation in two populations that have been largely absent from existing pangenomes. These resources are foundational for the Smart Health Saudi pangenome programme that uses the new reference to revisit rare-disease variant interpretation across the region.

Phenotype-aware variant prioritization

A long-running thread combines whole-genome and whole-exome sequencing with phenotype ontologies to rank candidate disease variants. Phenotype-driven discovery of digenic variants in personal genome sequences and the follow-up OligoPVP: Phenotype-driven analysis of individual genomic information to prioritize oligogenic disease variants address the difficult case of diseases caused by combinations of variants in more than one gene. DeepSVP: integration of genotype and phenotype for structural variant prioritization using deep learning extends the same principle to structural and copy-number variants by combining patient phenotypes with gene-function similarity learned from biomedical ontologies. The Critical assessment of variant prioritization methods for rare disease diagnosis within the rare genomes project benchmark, run on patients from the Rare Genomes Project, found that phenotype-aware models that consider non-coding variation and phenotype expansion recover diagnoses that purely sequence-based models miss. Our most recent contribution, VarLand: A pipeline to map the structural landscape of missense variants at the proteome scale, links missense variants to AlphaFold-derived protein structures so that structural context can be used alongside sequence- and phenotype-based evidence.

Sequencing strategy itself remains a research question, and What is the right sequencing approach? Solo VS extended family analysis in consanguineous populations quantifies how much additional family members narrow the candidate-variant search in consanguineous settings, with direct implications for diagnostic protocols in Saudi Arabia.

Microbial and applied genomics

On the microbial side, Genomic diversity and antimicrobial resistance of Staphylococcus aureus in Saudi Arabia: a nationwide study using whole-genome sequencing reports a nationwide MRSA surveillance effort that links genomic lineages and resistance determinants to the context of mass-gathering events. Disease-focused projects include the Genomic landscape of retinoblastoma: Insights into risk stratification and precision pediatric Neuro-Oncology study, which dissects somatic and germline alterations beyond RB1, and Genomic landscape in Saudi patients with hepatocellular carcinoma using whole-genome sequencing: a pilot study, which characterises sorafenib-related mutation patterns. Transcriptomic work such as Whole genome transcriptomic profiling reveals distinct sex-specific responses to heat stroke extends our genomics activity into climate-related health questions of regional relevance.

These resources and methods feed directly into the Saudi pangenome programme, AI-supported variant interpretation tools such as PhenomeNET-VP, DeepSVP, EmbedPVP, STARVar, INDIGENA and GenomeLinter, and applied projects on coral adaptation in the Red Sea, mangrove restoration and whale-shark population genomics, all of which use the same population-genomic and ontology-aware analytical stack.

Projects

Software

Publications (16)