Authors
Farrokh Mehryary, Katerina Nastou, Tomoko Ohta, Lars Juhl Jensen, Sampo Pyysalo
Publication date
2024/9
Journal
Bioinformatics
Volume
40
Issue
9
Pages
btae552
Publisher
Oxford University Press
Description
Motivation
Understanding biological processes relies heavily on curated knowledge of physical interactions between proteins. Yet, a notable gap remains between the information stored in databases of curated knowledge and the plethora of interactions documented in the scientific literature.
Results
To bridge this gap, we introduce ComplexTome, a manually annotated corpus designed to facilitate the development of text-mining methods for the extraction of complex formation relationships among biomedical entities targeting the downstream semantics of the physical interaction subnetwork of the STRING database. This corpus comprises 1287 documents with ∼3500 relationships. We train a novel relation extraction model on this corpus and find that it can highly reliably identify physical protein interactions (F1-score = 82.8%). We additionally enhance the model’s …
Total citations
2024202552