Exploring the Use of Ontology Components for Distantly-Supervised Disease and Phenotype Named Entity Recognition

Sumyyah Toonsi, Şenay Kafkas, Robert Hoehndorf*

*Corresponding author for this work

Research output: Contribution to conferencePaperpeer-review

Abstract

The lack of curated corpora is one of the major obstacles for Named Entity Recognition (NER). With the advancements in deep learning and development of robust language models, distant supervision utilizing weakly labelled data is often used to alleviate this problem. Previous approaches utilized weakly labeled corpora from Wikipedia or from the literature. However, to the best of our knowledge, none of them explored the use of the different ontology components for disease/phenotype NER under the distant supervision scheme. In this study, we explored whether different ontology components can be used to develop a distantly supervised disease/phenotype entity recognition model. We trained different models by considering ontology labels, synonyms, definitions, axioms and their combinations in addition to a model trained on literature. Results showed that content from the disease/phenotype ontologies can be exploited to develop a NER model performing at the state-of-the-art level. In particular, models that utilised both the ontology definitions and axioms showed competitive performance compared to the model trained on literature. This relieves the need of finding and annotating external corpora. Furthermore, models trained using ontology components made zero-shot predictions on the test datasets which were not observed by the models training on the literature based datasets.

Original languageEnglish (US)
Pages13-24
Number of pages12
StatePublished - 2023
Event14th International Conference on Biomedical Ontologies, ICBO 2023 - Brasilia, Brazil
Duration: Aug 28 2023Sep 1 2023

Conference

Conference14th International Conference on Biomedical Ontologies, ICBO 2023
Country/TerritoryBrazil
CityBrasilia
Period08/28/2309/1/23

Bibliographical note

Publisher Copyright:
© 2023 Copyright for this paper by its authors.

Keywords

  • Named Entity Recognition
  • ontologies
  • Text mining

ASJC Scopus subject areas

  • General Computer Science

Fingerprint

Dive into the research topics of 'Exploring the Use of Ontology Components for Distantly-Supervised Disease and Phenotype Named Entity Recognition'. Together they form a unique fingerprint.

Cite this