Reinforcement learning in sparse-reward environments with hindsight policy gradients

Paulo Rauber, Avinash Ummadisingu, Filipe Mutz, Jürgen Schmidhuber

Research output: Contribution to journalArticlepeer-review

6 Scopus citations

Abstract

A reinforcement learning agent that needs to pursue different goals across episodes requires a goal-conditional policy. In addition to their potential to generalize desirable behavior to unseen goals, such policies may also enable higher-level planning based on subgoals. In sparse-reward environments, the capacity to exploit information about the degree to which an arbitrary goal has been achieved while another goal was intended appears crucial to enabling sample efficient learning. However, reinforcement learning agents have only recently been endowed with such capacity for hindsight. In this letter, we demonstrate how hindsight can be introduced to policy gradient methods, generalizing this idea to a broad class of successful algorithms. Our experiments on a diverse selection of sparse-reward environments show that hindsight leads to a remarkable increase in sample efficiency.
Original languageEnglish (US)
Pages (from-to)1498-1553
Number of pages56
JournalNeural Computation
Volume33
Issue number6
DOIs
StatePublished - May 13 2021
Externally publishedYes

Bibliographical note

Generated from Scopus record by KAUST IRTS on 2022-09-14

ASJC Scopus subject areas

  • Cognitive Neuroscience

Fingerprint

Dive into the research topics of 'Reinforcement learning in sparse-reward environments with hindsight policy gradients'. Together they form a unique fingerprint.

Cite this