SpanPredict: Extraction of Predictive Document Spans with Neural Attention

Vivek Subramanian, Matthew Engelhard, Samuel Berchuck, Liqun Chen, Ricardo Henao, Lawrence Carin

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

In many natural language processing applications, identifying predictive text can be as important as the predictions themselves. When predicting medical diagnoses, for example, identifying predictive content in clinical notes not only enhances interpretability, but also allows unknown, descriptive (i.e., text-based) risk factors to be identified. We here formalize this problem as predictive extraction and address it using a simple mechanism based on linear attention. Our method preserves differentiability, allowing scalable inference via stochastic gradient descent. Further, the model decomposes predictions into a sum of contributions of distinct text spans. Importantly, we require only document labels, not ground-truth spans. Results show that our model identifies semantically-cohesive spans and assigns them scores that agree with human ratings, while preserving classification performance.

Original languageEnglish (US)
Title of host publicationNAACL-HLT 2021 - 2021 Conference of the North American Chapter of the Association for Computational Linguistics
Subtitle of host publicationHuman Language Technologies, Proceedings of the Conference
PublisherAssociation for Computational Linguistics (ACL)
Pages5234-5258
Number of pages25
ISBN (Electronic)9781954085466
StatePublished - 2021
Event2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021 - Virtual, Online
Duration: Jun 6 2021Jun 11 2021

Publication series

NameNAACL-HLT 2021 - 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference

Conference

Conference2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021
CityVirtual, Online
Period06/6/2106/11/21

Bibliographical note

Funding Information:
This work was funded by NIMH R01 MH121329 (Geraldine Dawson and Guillermo Sapiro, Co-PI). We gratefully acknowledge the conceptual input of Guillermo Sapiro, Geraldine Dawson, and Scott Kollins in this work.

Publisher Copyright:
© 2021 Association for Computational Linguistics.

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Hardware and Architecture
  • Information Systems
  • Software

Fingerprint

Dive into the research topics of 'SpanPredict: Extraction of Predictive Document Spans with Neural Attention'. Together they form a unique fingerprint.

Cite this