TY - GEN
T1 - Artificial curiosity with planning for autonomous perceptual and cognitive development
AU - Luciw, Matthew
AU - Graziano, Vincent
AU - Ring, Mark
AU - Schmidhuber, Jürgen
N1 - Generated from Scopus record by KAUST IRTS on 2022-09-14
PY - 2011/11/1
Y1 - 2011/11/1
N2 - Autonomous agents that learn from reward on high-dimensional visual observations must learn to simplify the raw observations in both space (i.e., dimensionality reduction) and time (i.e., prediction), so that reinforcement learning becomes tractable and effective. Training the spatial and temporal models requires an appropriate sampling scheme, which cannot be hard-coded if the algorithm is to be general. Intrinsic rewards are associated with samples that best improve the agent's model of the world. Yet the dynamic nature of an intrinsic reward signal presents a major obstacle to successfully realizing an efficient curiosity-drive. TD-based incremental reinforcement learning approaches fail to adapt quickly enough to effectively exploit the curiosity signal. In this paper, a novel artificial curiosity system with planning is implemented, based on developmental or continual learning principles. Least-squares policy iteration is used with an agent's internal forward model, to efficiently assign values for maximizing combined external and intrinsic reward. The properties of this system are illustrated in a high-dimensional, noisy, visual environment that requires the agent to explore. With no useful external value information early on, the self-generated intrinsic values lead to actions that improve both its spatial (perceptual) and temporal (cognitive) models. Curiosity also leads it to learn how it could act to maximize external reward. © 2011 IEEE.
AB - Autonomous agents that learn from reward on high-dimensional visual observations must learn to simplify the raw observations in both space (i.e., dimensionality reduction) and time (i.e., prediction), so that reinforcement learning becomes tractable and effective. Training the spatial and temporal models requires an appropriate sampling scheme, which cannot be hard-coded if the algorithm is to be general. Intrinsic rewards are associated with samples that best improve the agent's model of the world. Yet the dynamic nature of an intrinsic reward signal presents a major obstacle to successfully realizing an efficient curiosity-drive. TD-based incremental reinforcement learning approaches fail to adapt quickly enough to effectively exploit the curiosity signal. In this paper, a novel artificial curiosity system with planning is implemented, based on developmental or continual learning principles. Least-squares policy iteration is used with an agent's internal forward model, to efficiently assign values for maximizing combined external and intrinsic reward. The properties of this system are illustrated in a high-dimensional, noisy, visual environment that requires the agent to explore. With no useful external value information early on, the self-generated intrinsic values lead to actions that improve both its spatial (perceptual) and temporal (cognitive) models. Curiosity also leads it to learn how it could act to maximize external reward. © 2011 IEEE.
UR - https://ieeexplore.ieee.org/document/6037356
UR - http://www.scopus.com/inward/record.url?scp=80055020279&partnerID=8YFLogxK
U2 - 10.1109/DEVLRN.2011.6037356
DO - 10.1109/DEVLRN.2011.6037356
M3 - Conference contribution
SN - 9781612849904
BT - 2011 IEEE International Conference on Development and Learning, ICDL 2011
ER -