TY - GEN
T1 - Intrinsically motivated neuroevolution for vision-based reinforcement learning
AU - Cuccu, Giuseppe
AU - Luciw, Matthew
AU - Schmidhuber, Jürgen
AU - Gomez, Faustino
N1 - Generated from Scopus record by KAUST IRTS on 2022-09-14
PY - 2011/11/1
Y1 - 2011/11/1
N2 - Neuroevolution, the artificial evolution of neural networks, has shown great promise on continuous reinforcement learning tasks that require memory. However, it is not yet directly applicable to realistic embedded agents using high-dimensional (e.g. raw video images) inputs, requiring very large networks. In this paper, neuroevolution is combined with an unsupervised sensory pre-processor or compressor that is trained on images generated from the environment by the population of evolving recurrent neural network controllers. The compressor not only reduces the input cardinality of the controllers, but also biases the search toward novel controllers by rewarding those controllers that discover images that it reconstructs poorly. The method is successfully demonstrated on a vision-based version of the well-known mountain car benchmark, where controllers receive only single high-dimensional visual images of the environment, from a third-person perspective, instead of the standard two-dimensional state vector which includes information about velocity. © 2011 IEEE.
AB - Neuroevolution, the artificial evolution of neural networks, has shown great promise on continuous reinforcement learning tasks that require memory. However, it is not yet directly applicable to realistic embedded agents using high-dimensional (e.g. raw video images) inputs, requiring very large networks. In this paper, neuroevolution is combined with an unsupervised sensory pre-processor or compressor that is trained on images generated from the environment by the population of evolving recurrent neural network controllers. The compressor not only reduces the input cardinality of the controllers, but also biases the search toward novel controllers by rewarding those controllers that discover images that it reconstructs poorly. The method is successfully demonstrated on a vision-based version of the well-known mountain car benchmark, where controllers receive only single high-dimensional visual images of the environment, from a third-person perspective, instead of the standard two-dimensional state vector which includes information about velocity. © 2011 IEEE.
UR - http://ieeexplore.ieee.org/document/6037324/
UR - http://www.scopus.com/inward/record.url?scp=80055009389&partnerID=8YFLogxK
U2 - 10.1109/DEVLRN.2011.6037324
DO - 10.1109/DEVLRN.2011.6037324
M3 - Conference contribution
SN - 9781612849904
BT - 2011 IEEE International Conference on Development and Learning, ICDL 2011
ER -