Phenotype informatics

Phenotypes are the observable consequences of genotype, environment, and their interaction, and they remain the principal currency by which disease is recognised, model organisms are characterised, and plant traits are catalogued. Our work develops the informatics infrastructure that makes phenotype data computable across species and clinical settings: the phenotype ontologies themselves, the cross-species crosswalks that link them, the tools that capture and standardise phenotype descriptions from text and images, and the computational pipelines that connect phenotype evidence back to genes, variants, and diseases.

Ontologies and cross-species integration

A sustained strand of work has built the formal scaffolding for phenotype data. The anatomy of phenotype ontologies: principles, properties and applications synthesises the design principles underlying the Human Phenotype Ontology (HPO), the Mammalian Phenotype Ontology (MP), and their counterparts in zebrafish, plants, and yeast. The Entity-Quality formalism that powers these ontologies was articulated in Towards improving phenotype representation in OWL and Interoperability between phenotype and anatomy ontologies, which showed how phenotype classes can be decomposed into affected entities and qualities, then reasoned over with description logic. Cross-species integration is delivered by PhenomeNET: a whole-phenome approach to disease gene discovery and Integrating phenotype ontologies with PhenomeNET, which transform species-specific ontologies into a common semantic space and enable comparison of mouse, fish, fly, worm, yeast, and human phenotypes for disease gene prioritisation. Quantitative comparison of mapping methods between Human and Mammalian Phenotype Ontology and Semantic integration of physiology phenotypes with an application to the Cellular Phenotype Ontology extend the framework across model systems and physiological scales. Domain-specific ontologies built in this programme include The flora phenotype ontology (FLOPO): tool for integrating morphological traits and phenotypes of vascular plants, An ontology approach to comparative phenomics in plants, DermO; an ontology for the description of dermatologic disease, and the neurobehaviour ontology described in Best behaviour? Ontologies and the formal description of animal behaviour.

From phenotypes to genes, variants, and diagnosis

On the predictive side, DeepPheno: Predicting single gene loss-of-function phenotypes using an ontology-aware hierarchical classifier learns gene-to-HPO associations from function and expression data while respecting the ontology hierarchy, and Ontology-based validation and identification of regulatory phenotypes uses background knowledge to infer regulatory phenotypes from omics data. Phenotype-driven discovery of digenic variants in personal genome sequences and the more recent CAGI6 ID panel challenge: assessment of phenotype and variant predictions in 415 children with neurodevelopmental disorders (NDDs) apply these resources to clinical genome interpretation. For uncurated narrative text, Multi-faceted semantic clustering with text-derived phenotypes and Improved characterisation of clinical text through ontology-based vocabulary expansion build phenotype profiles directly from clinical free text, while Taxon and trait recognition from digitized herbarium specimens using deep convolutional neural networks demonstrates the same idea for plant specimen images. Large-scale evidence comes from Analysis of mammalian gene function through broad-based phenotypic screens across a consortium of mouse clinics and Systematic Analysis of Experimental Phenotype Data Reveals Gene Functions, with the contribution of model-organism data to disease genetics quantified in Contribution of model organism phenotypes to the computational identification of human disease genes and reviewed in Mouse genetic and phenotypic resources for human genetics and The informatics of developmental phenotypes.

This infrastructure is the substrate on which our diagnostic software stack runs. PhenomeNET-VP, DeepSVP, EmbedPVP, STARVar, INDIGENA, GenomeLinter, predCAN, and DeepViral together translate ontologies, cross-species phenotype data, and gene-phenotype predictions into tools for variant prioritisation, structural variant interpretation, cancer driver prediction, and pathogen-host interaction analysis. The same phenotype machinery also supports environmental and ecological work through plant trait ontologies and herbarium-scale trait extraction.

Software

Publications (31)