Solving deep memory POMDPs with Recurrent Policy gradients

Daan Wierstra, Alexander Foerster, Jan Peters, Jürgen Schmidhuber

Research output: Chapter in Book/Report/Conference proceedingConference contribution

67 Scopus citations

Abstract

This paper presents Recurrent Policy Gradients, a model-free reinforcement learning (RL) method creating limited-memory stochastic policies for partially observable Markov decision problems (POMDPs) that require long-term memories of past observations. The approach involves approximating a policy gradient for a Recurrent Neural Network (RNN) by backpropagating return-weighted characteristic eligibilities through time. Using a "Long Short-Term Memory" architecture, we are able to outperform other RL methods on two important benchmark tasks. Furthermore, we show promising results on a complex car driving simulation task. © Springer-Verlag Berlin Heidelberg 2007.
Original languageEnglish (US)
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
PublisherSpringer Verlag
Pages697-706
Number of pages10
ISBN (Print)9783540746898
DOIs
StatePublished - Jan 1 2007
Externally publishedYes

Fingerprint

Dive into the research topics of 'Solving deep memory POMDPs with Recurrent Policy gradients'. Together they form a unique fingerprint.

Cite this