Exploring through Random Curiosity with General Value Functions

Aditya Ramesh, Louis Kirsch, Sjoerd van Steenkiste, Juergen Schmidhuber

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Exploration in reinforcement learning through intrinsic rewards has previously been addressed by approaches based on state novelty or artificial curiosity. In partially observable settings where observations look alike, state novelty can lead to intrinsic reward vanishing prematurely. On the other hand, curiosity-based approaches require modeling precise environment dynamics which are potentially quite complex. Here we propose random curiosity with general value functions (RC-GVF), an intrinsic reward function that connects state novelty and artificial curiosity. Instead of predicting the entire environment dynamics, RC-GVF predicts temporally extended values through general value functions (GVFs) and uses the prediction error as an intrinsic reward. In this way, our approach generalizes a popular approach called random network distillation (RND) by encouraging behavioral diversity and reduces the need for additional maximum entropy regularization. Our experiments on four procedurally generated partially observable environments indicate that our approach is competitive to RND and could be beneficial in environments that require behavioural exploration.
Original languageEnglish (US)
Title of host publication35th Deep RL workshop, Conference on Neural Information Processing Systems (NeurIPS 2021)
StatePublished - 2022

Bibliographical note

KAUST Repository Item: Exported on 2022-12-21
Acknowledgements: We would like to thank Kenny Young, Francesco Faccio, and Anand Gopalakrishnan for their valuable comments. This research was supported by the ERC Advanced Grant (742870), the Swiss National Science Foundation grant (200021_192356), and by the Swiss National Supercomputing Centre (CSCS project s1090).


Dive into the research topics of 'Exploring through Random Curiosity with General Value Functions'. Together they form a unique fingerprint.

Cite this