LLM Agent Based Protein Function Prediction

Author
Keywords
Abstract

Generating vector representations (embeddings) of OWL ontologies is a growing task due to its applications in predicting missing facts and knowledge-enhanced learning in fields such as bioinformatics. The underlying semantics of OWL ontologies are expressed using Description Logics (DLs). Initial approaches to generate embeddings relied on constructing a graph out of ontologies, neglecting the semantics of the logic therein. Recent semantic-preserving embedding methods often target lightweight DL languages such as ℰ ℒ^+ +, ignoring more expressive information in ontologies. Although some approaches aim to embed more descriptive DLs such as ᵉC ℒ C, those methods require the existence of individuals, while many real-world ontologies are devoid of them. We propose an ontology embedding method for the ᵉC ℒ C DL language that considers the lattice structure of concept descriptions. We use connections between DL and Category Theory to materialize the lattice structure and embed it using an order-preserving embedding method. We show that our method outperforms state-of-the-art methods in several knowledge base completion tasks. This is an extended version of our previous work, where we incorporate saturation procedures that increase the information within the constructed lattices. We make our code and data available at https://github.com/bio-ontology-research-group/catE.

Year of Publication
2025
Conference Name
Biocomputing 2026
URL
https://doi.org/10.1142/9789819824755_0036
DOI
https://doi.org/10.1142/9789819824755_0036
Download citation