Abstract
The lack of curated corpora is one of the major obstacles for Named Entity Recognition (NER). With the advancements in deep learning and development of robust language models, distant supervision utilizing weakly labelled data is often used to alleviate this problem. Previous approaches utilized weakly labeled corpora from Wikipedia or from the literature. However, to the best of our knowledge, none of them explored the use of the different ontology components for disease/phenotype NER under the distant supervision scheme. In this study, we explored whether different ontology components can be used to develop a distantly supervised disease/phenotype entity recognition model. We trained different models by considering ontology labels, synonyms, definitions, axioms and their combinations in addition to a model trained on literature. Results showed that content from the disease/phenotype ontologies can be exploited to develop a NER model performing at the state-of-the-art level. In particular, models that utilised both the ontology definitions and axioms showed competitive performance compared to the model trained on literature. This relieves the need of finding and annotating external corpora. Furthermore, models trained using ontology components made zero-shot predictions on the test datasets which were not observed by the models training on the literature based datasets.
Original language | English (US) |
---|---|
Pages | 13-24 |
Number of pages | 12 |
State | Published - 2023 |
Event | 14th International Conference on Biomedical Ontologies, ICBO 2023 - Brasilia, Brazil Duration: Aug 28 2023 → Sep 1 2023 |
Conference
Conference | 14th International Conference on Biomedical Ontologies, ICBO 2023 |
---|---|
Country/Territory | Brazil |
City | Brasilia |
Period | 08/28/23 → 09/1/23 |
Bibliographical note
Publisher Copyright:© 2023 Copyright for this paper by its authors.
Keywords
- Named Entity Recognition
- ontologies
- Text mining
ASJC Scopus subject areas
- General Computer Science