Abstract
How to reduce compute and memory requirements of neural networks (NNs) without sacrificing performance? Many recent works use sparse Mixtures of Experts (MoEs) to build resource-efficient large language models (LMs). Here we introduce several novel perspectives on MoEs, presenting a general framework that unifies various methods to approximate two-layer NNs (e.g., feedforward blocks of Transformers), including product-key memories (PKMs). Leveraging insights from this framework, we propose methods to improve both MoEs and PKMs. Unlike prior work that compares MoEs with dense baselines under the compute-equal condition, our evaluation condition is parameter-equal, which is crucial to properly evaluate LMs. We show that our MoEs are competitive with the dense Transformer-XL on both the WikiText-103 and enwiki8 datasets at two different scales, while being much more resource-efficient. This demonstrates that MoEs are relevant not only to extremely large LMs but also to any-scale resource-efficient LMs. Our code is public.
Original language | English (US) |
---|---|
Title of host publication | Findings of the Association for Computational Linguistics |
Subtitle of host publication | EMNLP 2023 |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 674-692 |
Number of pages | 19 |
ISBN (Electronic) | 9798891760615 |
DOIs | |
State | Published - 2023 |
Event | 2023 Findings of the Association for Computational Linguistics: EMNLP 2023 - Hybrid, Singapore Duration: Dec 6 2023 → Dec 10 2023 |
Publication series
Name | Findings of the Association for Computational Linguistics: EMNLP 2023 |
---|
Conference
Conference | 2023 Findings of the Association for Computational Linguistics: EMNLP 2023 |
---|---|
Country/Territory | Singapore |
City | Hybrid |
Period | 12/6/23 → 12/10/23 |
Bibliographical note
Publisher Copyright:© 2023 Association for Computational Linguistics.
ASJC Scopus subject areas
- Computational Theory and Mathematics
- Computer Science Applications
- Information Systems
- Language and Linguistics
- Linguistics and Language