Schematic of inductive gene-disease prediction: a phenotype ontology graph, a latent space aggregating phenotypes into a disease representation, and a ranked column of candidate genes

New publication: INDIGENA predicts disease–gene associations from phenotype ontologies

1 min read ·

A new paper in Bioinformatics introduces INDIGENA, an inductive method that ranks candidate genes for a disease from its phenotypes using graph embeddings of phenotype ontologies, and that generalises to diseases never seen during training.

About

Identifying which genes are involved in a disease is often guided by the disease's phenotypes, the observable signs and symptoms. Established methods compare phenotypes using semantic similarity over phenotype ontologies, but those measures are fixed and use only the ontology taxonomy. More recent machine-learning approaches learn from known gene-disease associations, yet most are transductive: they cannot make predictions for a disease whose particular combination of phenotypes was not present when the model was trained.

In a new paper published in Bioinformatics, Fernando Zhapa-Camacho and Professor Robert Hoehndorf present INDIGENA, an inductive method for predicting disease-gene associations. INDIGENA projects the axioms of phenotype ontologies into a graph, embeds that graph to place individual phenotypes in a latent space, and then aggregates the phenotype embeddings of a disease into a single disease representation using an explicit Best-Match Average strategy. Because a disease is built up from its phenotypes rather than memorised as a fixed entity, the model can score genes for entirely new phenotype combinations.

A supervised signal from known gene-disease associations makes the embeddings and the similarity measure task-specific. Evaluated on mouse models of human disease, INDIGENA improves substantially over inductive semantic-similarity baselines and matches the performance of transductive methods, while retaining the ability to generalise to previously unseen diseases. This makes it well suited to rare and newly characterised conditions, where annotated examples are scarce.

The paper is available in Bioinformatics (DOI: 10.1093/bioinformatics/btag325) and the source code is on GitHub at bio-ontology-research-group/indigena.