Improvement of genetic variant prioritization technology
Overview
By 2018 the PhenomeNET Variant Predictor (PVP) had been validated as a research tool for prioritizing causative variants in Mendelian disease, and a clear question had emerged: could the same phenotype-driven, knowledge-graph-based machinery be retargeted to cancer, where the variant landscape is somatic, heterogeneous, and dominated by combinations of driver and passenger mutations rather than single pathogenic alleles? This one-year KAUST Center Partnership Fund project (2018–2019, OSR-2018-CPF-3657-0; with Schofield in Cambridge, Gkoutos in Birmingham, and Bajic at KAUST) was the engineering bridge that brought PVP from a Mendelian-only research prototype to a system capable of prioritizing driver mutations from somatic cancer genomes and improved Mendelian diagnosis at the same time.
The technical work had four objectives: build a dataset of cellular phenotypes associated with the 28 common cancer types in IntOGen, drawing from the Cellular Phenotype Database, the Cellular Microscopy Phenotype Ontology (CMPO), and the Mammalian Pathology Ontology (MPATH); curate oligogenic combinations of somatic driver variants from literature and text mining; integrate this corpus into PVP and replace its germline-focused pathogenicity scorer with MUTECT2 for somatic calls; and evaluate the result on patient cohorts at the three sites. The scientific bet was that cellular phenotypes from in-vitro loss-of-function studies are less pleiotropic than whole-organism phenotypes and therefore more directly predictable from Gene Ontology function annotations — a hypothesis the project tested directly.
Methodological deliverables
The improvements were as much methodological as engineering. Self-normalizing learning on biomedical ontologies (2016, foundational input) provided the regularisation approach that made the deep prioritization models stable when trained over heterogeneous phenotype data. Formal axioms in biomedical ontologies improve analysis and interpretation of associated data (2018–2019) demonstrated, with quantitative ablations, that the formal axioms in ontologies — the disjointness assertions, the part-of and develops-from relations, the existential restrictions that most users ignore — measurably improve downstream analysis when fed into similarity computation and embedding models. That paper is the empirical justification for the axiom-aware similarity routines the upgraded PVP uses on cancer data. Semantic similarity and machine learning with biomedical ontologies (Briefings in Bioinformatics, 2020) consolidated the resulting methodology into a reference treatment that documents which similarity measures and embedding methods perform best for which class of biomedical prediction task — the methodological substrate that downstream PVP variants, DeepPVP, and OligoPVP rely on.
The project delivered: a phenotype-annotated dataset linking common cancer types to cell-level phenotypes; a curated set of digenic and oligogenic driver-variant combinations; and a version of the PVP software extended to prioritize driver mutations in cancer alongside Mendelian variants. The software remains open source while the curated phenotype dataset became part of the IP that supports the group's continuing variant-interpretation work. The improvements made here — axiom-aware semantic similarity, ontology-derived cellular phenotype predictions from GO annotations, the somatic-variant scoring pipeline — carried directly into the subsequent CRG-CompleX project (2019–2021) on oligogenic disease and into the rare-disease prioritization work that followed.
Period: 2018–2019
Funding
- KAUST Center Partnership Fund
— Grant ID:
FCS/1/3657-01-01(PI) — USD 9,500
Team
- Robert Hoehndorf — PI (KAUST (Professor of Computer Science))
- Paul N Schofield — CoI (University of Cambridge)
- Georgios V Gkoutos — CoI (University of Birmingham)
- Vladimir Bajic — CoI (Former KAUST CBRC director (retired))
Publications acknowledging this project (4)
- (2020) Semantic similarity and machine learning with biomedical ontologies
- (2019) Formal axioms in biomedical ontologies improve analysis and interpretation of associated data
- (2018) Formal axioms in biomedical ontologies improve analysis and interpretation of associated data
- (2016) Self-normalizing learning on biomedical ontologies
Topics: Applied Ontology, Neuro-symbolic AI, Rare disease