Abstract
Recent studies of the computational power of recurrent neural networks (RNNs) reveal a hierarchy of RNN architectures, given real-time and finite-precision assumptions. Here we study auto-regressive Transformers with linearised attention, a.k.a. linear Transformers (LTs) or Fast Weight Programmers (FWPs). LTs are special in the sense that they are equivalent to RNN-like sequence processors with a fixed-size state, while they can also be expressed as the now-popular self-attention networks. We show that many well-known results for the standard Transformer directly transfer to LTs/FWPs. Our formal language recognition experiments demonstrate how recently proposed FWP extensions such as recurrent FWPs and self-referential weight matrices successfully overcome certain limitations of the LT, e.g., allowing for generalisation on the parity problem. Our code is public.
Original language | English (US) |
---|---|
Title of host publication | EMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings |
Editors | Houda Bouamor, Juan Pino, Kalika Bali |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 9455-9465 |
Number of pages | 11 |
ISBN (Electronic) | 9798891760608 |
DOIs | |
State | Published - 2023 |
Event | 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023 - Hybrid, Singapore, Singapore Duration: Dec 6 2023 → Dec 10 2023 |
Publication series
Name | EMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings |
---|
Conference
Conference | 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023 |
---|---|
Country/Territory | Singapore |
City | Hybrid, Singapore |
Period | 12/6/23 → 12/10/23 |
Bibliographical note
Publisher Copyright:© 2023 Association for Computational Linguistics.
ASJC Scopus subject areas
- Computational Theory and Mathematics
- Computer Science Applications
- Information Systems
- Linguistics and Language