New publication uses ontology reasoning to evaluate genome-scale function annotations

2 min read ·

Our new paper in Briefings in Bioinformatics uses the Gene Ontology, GO-Plus and logical constraints to evaluate whether genome-scale function annotations are complete, coherent and consistent.

About

Protein function annotation is usually evaluated one protein at a time. This is useful, and it has driven major progress through benchmarks such as CAFA. But it misses an important biological question: can all predicted functions in a genome plausibly coexist in a living organism? Proteins act in pathways, compartments, complexes and broader cellular systems, so we argue that a genome-scale annotation set should satisfy logical and biological constraints that are invisible when annotations are treated as independent facts.

In a new paper published in Briefings in Bioinformatics, we introduce an ontology-based framework for evaluating genome-scale function annotations. The work was carried out in BORG by Rund Tawfiq, Maxat Kulmanov and Robert Hoehndorf. Our framework uses the Gene Ontology (GO), GO-Plus, OWL reasoning, the OBO Relation Ontology and GO taxonomic constraints to make the implicit commitments of function annotations explicit at the level of whole genomes.

We formalize three system-level criteria. Completeness asks whether essential functions are present in a viable genome. Coherence asks whether functional dependencies are satisfied, for example when a GO class has required parts, when a pathway requires a chain of enzymatic steps, or when a heteromeric protein complex should be represented by more than one protein. Consistency asks whether the annotation set avoids mutually exclusive functions and taxonomic contradictions, using GO only in taxon and never in taxon constraints.

Much of the contribution is in turning ontology structure into measurable genome-scale constraints. We use OWL reasoning over GO and GO-Plus to identify dependencies expressed through relations such as has part and occurs in, extract thousands of GO class pairs connected by inferred has part relations, combine GO with MetaCyc pathway structure, and use taxon constraints as logical satisfiability checks. The formal framework introduces relations such as has function and in genome so that protein-level GO annotations can be evaluated as a coherent set over a proteome.

We applied the framework to manually curated function annotations from six model organisms and to computational predictions from seven methods. Curated model-organism annotations largely satisfied the proposed constraints, while current computational prediction methods systematically failed to produce biologically plausible genome-scale annotation sets. The results show a measurable gap between optimizing per-protein prediction accuracy and satisfying the system-level requirements that an organism-level annotation must meet.

We also show that the new metrics are complementary to CAFA-style protein-level evaluation: high individual-protein performance does not necessarily imply genome-scale completeness, coherence or consistency. This points to a practical direction for future function prediction and annotation systems, including topics central to the Function COSI community at ISMB: using ontology-derived constraints, post-hoc constraint satisfaction, system-level loss functions and proteome-scale architectures alongside protein-level evidence.

The paper is available at https://doi.org/10.1093/bib/bbag336. The evaluation method and results are available at https://github.com/bio-ontology-research-group/GAEF.