TY - GEN
T1 - Sequence generation with optimal-transport-enhanced reinforcement learning
AU - Chen, Liqun
AU - Bai, Ke
AU - Tao, Chenyang
AU - Zhang, Yizhe
AU - Wang, Guoyin
AU - Wang, Wenlin
AU - Henao, Ricardo
AU - Carin, Lawrence
N1 - Generated from Scopus record by KAUST IRTS on 2023-02-15
PY - 2020/1/1
Y1 - 2020/1/1
N2 - Reinforcement learning (RL) has been widely used to aid training in language generation. This is achieved by enhancing standard maximum likelihood objectives with user-specified reward functions that encourage global semantic consistency. We propose a principled approach to address the difficulties associated with RL-based solutions, namely, high-variance gradients, uninformative rewards and brittle training. By leveraging the optimal transport distance, we introduce a regularizer that significantly alleviates the above issues. Our formulation emphasizes the preservation of semantic features, enabling end-to-end training instead of ad-hoc fine-tuning, and when combined with RL, it controls the exploration space for more efficient model updates. To validate the effectiveness of the proposed solution, we perform a comprehensive evaluation covering a wide variety of NLP tasks: machine translation, abstractive text summarization and image caption, with consistent improvements over competing solutions.
AB - Reinforcement learning (RL) has been widely used to aid training in language generation. This is achieved by enhancing standard maximum likelihood objectives with user-specified reward functions that encourage global semantic consistency. We propose a principled approach to address the difficulties associated with RL-based solutions, namely, high-variance gradients, uninformative rewards and brittle training. By leveraging the optimal transport distance, we introduce a regularizer that significantly alleviates the above issues. Our formulation emphasizes the preservation of semantic features, enabling end-to-end training instead of ad-hoc fine-tuning, and when combined with RL, it controls the exploration space for more efficient model updates. To validate the effectiveness of the proposed solution, we perform a comprehensive evaluation covering a wide variety of NLP tasks: machine translation, abstractive text summarization and image caption, with consistent improvements over competing solutions.
UR - http://www.scopus.com/inward/record.url?scp=85104120311&partnerID=8YFLogxK
M3 - Conference contribution
SN - 9781577358350
SP - 7512
EP - 7520
BT - AAAI 2020 - 34th AAAI Conference on Artificial Intelligence
PB - AAAI press
ER -