Abstract
We introduce the infinite regionalized policy presentation (iRPR), as a nonparametric policy for reinforcement learning in partially observable Markov decision processes (POMDPs). The iRPR assumes an unbounded set of decision states a priori, and infers the number of states to represent the policy given the experiences. We propose algorithms for learning the number of decision states while maintaining a proper balance between exploration and exploitation. Convergence analysis is provided, along with performance evaluations on benchmark problems. Copyright 2011 by the author(s)/owner(s).
Original language | English (US) |
---|---|
Title of host publication | Proceedings of the 28th International Conference on Machine Learning, ICML 2011 |
Pages | 769-776 |
Number of pages | 8 |
State | Published - Oct 7 2011 |
Externally published | Yes |