End-to-end relation extraction based on bootstrapped multi-level distant supervision

Ying He, Zhixu Li, Qiang Yang, Zhigang Chen, An Liu, Lei Zhao, Xiaofang Zhou

Research output: Contribution to journalArticlepeer-review

6 Scopus citations

Abstract

Distant supervised relation extraction has been widely used to identify new relation facts from free text, since the existence of knowledge base helps these models to build a large dataset with few human intervention and low costs of manpower and time. However, the existing Distant Supervised models are all based on the single-node classifier so that they suffer from the serious false categorization problem especially for the existence of thousands of relations. In this paper, we novelly propose an end-to-end model for relation extraction based on distant supervision. Our model divides the original categorization task into a number of sub-tasks which focus on the construction of tree-like categorization structure in multiple levels. With the tree-like structure, an unlabelled relation instance can be categorized step by step along a path from the root node to a leaf node. An additional benefit of this structure is that it can be used to select negative samples from training data for each child node. In addition, to the best of our knowledge, no effort has been made to update the categorization model with new identified relation facts, which hinders the improvement of the extraction precision and recall. Although bootstrapping methods can contribute to improving the situation, they need additional calculation to evaluate the quality of extracted patterns or tuples when selecting new instances for next iterations. In this paper, we propose to do bootstrapped distant supervision to update the distant supervision model with new learned relation facts iteratively, and we can use scores directly gotten in the model to evaluate the quality of instances instead of additional calculation. As a result, we can further improve the extraction precision and recall. To save the time costs and manpower, we also propose an adaptive method by mapping function to choose the suitable thresholds for each iteration without manual choice rather than using the fixed thresholds. Experimental results conducted on three real datasets prove that our approach outperforms state-of-the-art approaches by reaching 12+% better extraction quality.
Original languageEnglish (US)
JournalWorld Wide Web
DOIs
StatePublished - Apr 24 2020

Bibliographical note

KAUST Repository Item: Exported on 2020-10-01
Acknowledgements: This research is partially supported by National Key R&D Program of China (No.2018AAA0101900), Natural Science Foundation of Jiangsu Province (No. BK2019 1420), National Natural Science Foundation of China (Grant No. 61632016, 61572336, 61572335, 61772356), Natural Science Research Project of Jiangsu Higher Education Institution (No. 17KJA520003, 18KJA520010) and the Open
Program of Neusoft Corporation (No. SKLSAOP1801).

Fingerprint

Dive into the research topics of 'End-to-end relation extraction based on bootstrapped multi-level distant supervision'. Together they form a unique fingerprint.

Cite this