Neural Differential Equations for Learning to Program Neural Nets Through Continuous Learning Rules

Kazuki Irie, Francesco Faccio, Juergen Schmidhuber

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Neural ordinary differential equations (ODEs) have attracted much attention as continuous-time counterparts of deep residual neural networks (NNs), and numerous extensions for recurrent NNs have been proposed. Since the 1980s, ODEs have also been used to derive theoretical results for NN learning rules, e.g., the famous connection between Oja's rule and principal component analysis. Such rules are typically expressed as additive iterative update processes which have straightforward ODE counterparts. Here we introduce a novel combination of learning rules and Neural ODEs to build continuous-time sequence processing nets that learn to manipulate short-term memory in rapidly changing synaptic connections of other nets. This yields continuous-time counterparts of Fast Weight Programmers and linear Transformers. Our novel models outperform the best existing Neural Controlled Differential Equation based models on various time series classification tasks, while also addressing their fundamental scalability limitations. Our code is public.
Original languageEnglish (US)
Title of host publication36th Conference on Neural Information Processing Systems (NeurIPS 2022).
StatePublished - Oct 14 2022

Bibliographical note

KAUST Repository Item: Exported on 2022-12-21
Acknowledgements: We would like to thank Kidger et al. [4], Morrill et al. [37] and Du et al. [61] for their public code. This research was partially funded by ERC Advanced grant no: 742870, project AlgoRNN, and by Swiss National Science Foundation grant no: 200021_192356, project NEUSYM. We are thankful for hardware donations from NVIDIA and IBM. The resources used for this work were partially provided by Swiss National Supercomputing Centre (CSCS) project s1145 and s1154.


Dive into the research topics of 'Neural Differential Equations for Learning to Program Neural Nets Through Continuous Learning Rules'. Together they form a unique fingerprint.

Cite this