Breadcrumb
- Home
- The Application of Large Language Models To The Phenotype-based Prioritization of Causative Genes In Rare Disease Patients
The application of Large Language Models to the phenotype-based prioritization of causative genes in rare disease patients
| Author | |
|---|---|
| Keywords | |
| Abstract |
Computational methods for identifying gene-disease associations can use both genomic and phenotypic information to prioritize genes and variants that may be associated with genetic diseases. Phenotype-based methods commonly rely on comparing phenotypes observed in a patient with databases of genotype-to-phenotype associations using measures of semantic similarity. They are constrained by the quality and completeness of these resources as well as the quality and completeness of patient phenotype annotation. Genotype-to-phenotype associations used by these methods are largely derived from the literature and coded using phenotype ontologies. Large Language Models (LLMs) have been trained on large amounts of text and data and have shown their potential to answer complex questions across multiple domains. Here, we evaluate the effectiveness of LLMs in prioritizing disease-associated genes compared to existing bioinformatics methods. We show that LLMs can prioritize disease-associated genes as well, or better than, dedicated bioinformatics methods relying on pre-defined phenotype similarity, when gene sets range from 5 to 100 candidates. We apply our approach to a cohort of undiagnosed patients with rare diseases and show that LLMs can be used to provide diagnostic support that helps in identifying plausible candidate genes. Our results show that LLMs may offer an alternative to traditional bioinformatics methods to prioritize disease-associated genes based on disease phenotypes. They may, therefore, potentially enhance diagnostic accuracy and simplify the process for rare genetic diseases. |
| Year of Publication |
2025
|
| Journal |
Scientific Reports
|
| URL |
https://doi.org/10.1038/s41598-025-99539-y
|
| DOI |
https://doi.org/10.1038/s41598-025-99539-y
|
| Download citation |