LLM Agent Based Protein Function Prediction
Abstract
Generating vector representations (embeddings) of OWL ontologies is a growing task due to its applications in predicting missing facts and knowledge-enhanced learning in fields such as bioinformatics. The underlying semantics of OWL ontologies are expressed using Description Logics (DLs). Initial approaches to generate embeddings relied on constructing a graph out of ontologies, neglecting the semantics of the logic therein. Recent semantic-preserving embedding methods often target lightweight DL languages such as <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="inline" overflow="scroll"> <mml:msup> <mml:mrow> <mml:mi mathvariant="script">E</mml:mi> <mml:mi mathvariant="script">L</mml:mi> </mml:mrow> <mml:mrow> <mml:mo>+</mml:mo> <mml:mo>+</mml:mo> </mml:mrow> </mml:msup> </mml:math> , ignoring more expressive information in ontologies. Although some approaches aim to embed more descriptive DLs such as <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="inline" overflow="scroll"> <mml:mrow> <mml:mi mathvariant="script">A</mml:mi> <mml:mi mathvariant="script">L</mml:mi> <mml:mi mathvariant="script">C</mml:mi> </mml:mrow> </mml:math> , those methods require the existence of individuals, while many real-world ontologies are devoid of them. We propose an ontology embedding method for the <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="inline" overflow="scroll"> <mml:mrow> <mml:mi mathvariant="script">A</mml:mi> <mml:mi mathvariant="script">L</mml:mi> <mml:mi mathvariant="script">C</mml:mi> </mml:mrow> </mml:math> DL language that considers the lattice structure of concept descriptions. We use connections between DL and Category Theory to materialize the lattice structure and embed it using an order-preserving embedding method. We show that our method outperforms state-of-the-art methods in several knowledge base completion tasks. This is an extended version of our previous work, where we incorporate saturation procedures that increase the information within the constructed lattices. We make our code and data available at https://github.com/bio-ontology-research-group/catE .