Goal-Conditioned Generators of Deep Policies

Francesco Faccio, Vincent Herrmann, Aditya Ramesh, Louis Kirsch, Juergen Schmidhuber

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Goal-conditioned Reinforcement Learning (RL) aims at learning optimal policies, given goals encoded in special command inputs. Here we study goal-conditioned neural nets (NNs) that learn to generate deep NN policies in form of context-specific weight matrices, similar to Fast Weight Programmers and other methods from the 1990s. Using context commands of the form "generate a policy that achieves a desired expected return," our NN generators combine powerful exploration of parameter space with generalization across commands to iteratively find better and better policies. A form of weight-sharing HyperNetworks and policy embeddings scales our method to generate deep NNs. Experiments show how a single learned policy generator can produce policies that achieve any return seen during training. Finally, we evaluate our algorithm on a set of continuous control tasks where it exhibits competitive performance.
Original languageEnglish (US)
Title of host publicationICML 2022 : 39th International Conference on Machine Learning
PublisherarXiv
StatePublished - 2022

Bibliographical note

KAUST Repository Item: Exported on 2022-12-21
Acknowledgements: We thank Kazuki Irie, Mirek Strupl, Dylan Ashley, Róbert Csordás, Aleksandar Stanic and Anand ´ Gopalakrishnan for their feedback. This work was supported by the ERC Advanced Grant (no: 742870) and by the Swiss National Supercomputing Centre (CSCS, projects: s1090, s1154). We also thank NVIDIA Corporation for donating a DGX-1 as part of the Pioneers of AI Research Award and to IBM for donating a Minsky machine.

Fingerprint

Dive into the research topics of 'Goal-Conditioned Generators of Deep Policies'. Together they form a unique fingerprint.

Cite this