The infinite regionalized policy representation

Miao Liu, Xuejun Liao, Lawrence Carin

Research output: Chapter in Book/Report/Conference proceedingConference contribution

15 Scopus citations

Abstract

We introduce the infinite regionalized policy presentation (iRPR), as a nonparametric policy for reinforcement learning in partially observable Markov decision processes (POMDPs). The iRPR assumes an unbounded set of decision states a priori, and infers the number of states to represent the policy given the experiences. We propose algorithms for learning the number of decision states while maintaining a proper balance between exploration and exploitation. Convergence analysis is provided, along with performance evaluations on benchmark problems. Copyright 2011 by the author(s)/owner(s).
Original languageEnglish (US)
Title of host publicationProceedings of the 28th International Conference on Machine Learning, ICML 2011
Pages769-776
Number of pages8
StatePublished - Oct 7 2011
Externally publishedYes

Bibliographical note

Generated from Scopus record by KAUST IRTS on 2021-02-09

Fingerprint

Dive into the research topics of 'The infinite regionalized policy representation'. Together they form a unique fingerprint.

Cite this